Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import cross sections from csv #41

Draft
wants to merge 28 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0a0270c
add test for process_pdos
Jun 23, 2021
d5c8552
fix the imports
Yaxuan-Lii Jun 29, 2021
553738f
removeed irrelevant files
Yaxuan-Lii Jul 1, 2021
5300d62
import is fixed
Yaxuan-Lii Jul 1, 2021
dab091b
try to remove irrelevant fils second time
Yaxuan-Lii Jul 1, 2021
d69d61b
remove irrelevant files second time
Yaxuan-Lii Jul 1, 2021
7984c15
delet irrelevant files
Yaxuan-Lii Jul 1, 2021
66b1c6f
remove irrelevant files
Yaxuan-Lii Jul 1, 2021
4fdafa7
delete .DS_Store
Yaxuan-Lii Jul 1, 2021
dca2d49
Modified test_process_pdos.py
Yaxuan-Lii Jul 2, 2021
eb079f6
add test.py
Yaxuan-Lii Jul 2, 2021
8190183
Restore some files that were accidentally deleted
ajjackson Jul 2, 2021
946efaa
use flake8 to optimise format
Yaxuan-Lii Jul 5, 2021
cd0f3c0
import cross-sections from CSV archives
Yaxuan-Lii Jul 15, 2021
8f48e5e
form modify
Yaxuan-Lii Jul 17, 2021
a959150
modify follow the comments
Yaxuan-Lii Jul 21, 2021
067c289
make corrections
Yaxuan-Lii Jul 21, 2021
a621ab0
make correction
Yaxuan-Lii Jul 21, 2021
13911d0
Revert changes to galore/__init__.py
ajjackson Jul 21, 2021
4fb993d
modified as comments
Yaxuan-Lii Aug 2, 2021
cd4dc0e
merge the change of galore/__init__.py
Yaxuan-Lii Aug 2, 2021
5c9c31c
add cli to install data and get cross sections
Yaxuan-Lii Aug 9, 2021
b138c4b
merge get_cross_sections_from_csv into get_cross_sections
Yaxuan-Lii Aug 10, 2021
bb8a526
modify and add new test
Yaxuan-Lii Aug 24, 2021
5cdb67b
add a IF statement
Yaxuan-Lii Aug 31, 2021
39651d5
modify as comments
Yaxuan-Lii Sep 9, 2021
a2441b5
modify as comments
Yaxuan-Lii Sep 9, 2021
f59b50a
mistakes fix
Yaxuan-Lii Sep 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions galore/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,13 @@
from collections.abc import Sequence
import logging

import galore.formats
from math import sqrt, log
import numpy as np
from scipy.interpolate import interp1d

import galore.formats
from galore.cross_sections import cross_sections_info

from galore.cross_sections import get_cross_sections, cross_sections_info
ajjackson marked this conversation as resolved.
Show resolved Hide resolved


def auto_limits(data_1d, padding=0.05):
Expand Down
366 changes: 366 additions & 0 deletions galore/cross_sections.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,369 @@ def _eval_fit(energy, coeffs):
orb: _eval_fit(energy, np_fromstr(coeffs))
for orb, coeffs in orbitals_fits}})
return el_cross_sections



import tarfile
import numpy as np
def read_csv_file(tar_file_name,file_path):
'''read csv file
Input: the file name
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
Output: main matrix of each file'''

###Open zipfile
with tarfile.open(tar_file_name) as tf:
with tf.extractfile(file_path) as hello:
data = hello.read().decode()
a = data.split('\r\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use descriptive names for variables. It is hard to read a line of code operating on a, b, c, and d and understand what it is supposed to be doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new name data_string is a bit better because it is at least "greppable". But it's also a bit misleading because data_string is not actually a string, it's a list. Maybe something like data_lines would be better, as this conveys how it was split?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_strings would at least be better. If you read data_string[0] it looks like it indexes a single letter from a string. Whereas data_strings[0] clearly gets a longer string, which can be split.


###get number of elements of each raw
a0 = a[0].split(',')
new_a0 = [i for i in a0 if i !='']
lenth = len(new_a0)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

###build main matrix
result = []
for i in range(len(a)):
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
c = a[i].split(',')[0:lenth]
result.append(c)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

###delet needless elements
d = result[-2]
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
result1 = [i for i in result if i!=d]
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
new_result = [i for i in result if i!=d][0:-2]

###build dict
dic={}
dic['headers'] = new_result[0]
dic['electron_counts'] = [i for i in result1[-2] if i !=''][1:]
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
dic['data_table'] = new_result[1:]
ajjackson marked this conversation as resolved.
Show resolved Hide resolved


return dic

ajjackson marked this conversation as resolved.
Show resolved Hide resolved


def _cross_sections_from_csv_data(energy,data,reference):
ajjackson marked this conversation as resolved.
Show resolved Hide resolved


ajjackson marked this conversation as resolved.
Show resolved Hide resolved
## replace '' with nan
for i in range(len(data['data_table'])):
data['data_table'][i] = [float('NaN') if x == '' else x for x in data['data_table'][i]]

## change the main matrix to float array
data['data_table'] = np.array(data['data_table']).astype(float)
data['electron_counts'] = np.array(data['electron_counts']).astype(float)

## build a new dict which keys are like '1s1/2', '2s1/2', '2p1/2', '2p3/2', '3s1/2', '3p1/2', '3p3/2'...
new_lenth = len(data['electron_counts'])
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
new_value=np.concatenate((data['data_table'].T[-new_lenth:].T,[data['electron_counts']]),axis=0).T
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
new_dic = {}
for i in range(new_lenth):
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
new_key = data['headers'][-new_lenth:][i]
new_dic[new_key]=new_value[i]

## add electron numbers of each orbitals
energy_array = np.array(data['data_table']).T[0]
new_dic['PhotonEnergy'] = energy_array

## match the import energy
index = np.where(new_dic['PhotonEnergy']==energy)[0][0]

## build result dict
res_dict = {}

## result for s orbital
c_s = np.array([new_dic[key] for key in new_dic if 's' in key]).T[index]
n_electrons = np.array([new_dic[key] for key in new_dic if 's' in key]).T[-1]
unit_c_s = np.true_divide(c_s,n_electrons)


value_s = np.max(np.nan_to_num(unit_c_s))

res_dict['s'] = value_s
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

## result for 'p', 'd', 'f' orbitals
orbitals = ['p', 'd', 'f']


for i in orbitals:
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
main_matrix = np.array([new_dic[key] for key in new_dic if i in key])
if np.shape(main_matrix) != (0,):
if reference == 'Scofield':
c_s = main_matrix.T[index]

n_electrons = main_matrix.T[-1]
unit_c_s = np.true_divide(c_s,n_electrons)
unit_c_s = np.array([unit_c_s[i:i+2] for i in range(0, len(unit_c_s), 2)])
percent =np.array([np.true_divide(c_s[i:i+2],c_s[i:i+2].sum()) for i in range(0, len(c_s), 2)])
result = np.array(list(map(sum,unit_c_s*percent)))
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

value = np.max(np.nan_to_num(result))
res_dict[i] = value
else:
c_s = main_matrix.T[index]
n_electrons = main_matrix.T[-1]
unit_c_s = np.true_divide(c_s,n_electrons)
value = np.max(np.nan_to_num(unit_c_s))
res_dict[i] = value

return res_dict


def get_metadata(energy,reference):
dict = {}
dict['energy'] = energy
if reference == 'Scofield':
dict['reference'] = 'J.H. Scofield, Theoretical photoionization cross sections from 1 to 1500 keV'
dict['link'] = 'https://doi.org/10.2172/4545040'
else:
dict['reference'] = 'Yeh, J.J. and Lindau, I. (1985) Atomic Data and Nuclear Data Tables 32 pp 1-155'
dict['link'] = 'https://doi.org/10.1016/0092-640X(85)90016-6'
return dict


def get_cross_section_from_csv(elements,energy,reference):
result = {}
metadata = get_metadata(energy,reference)
result.update(metadata)


for element in elements:

if reference == 'Scofield':
filename = 'Scofield_csv_database.tar.gz'
filepath = 'Scofield_csv_database/Z_{element1}.csv'
else:
filename ='Yeh_Lindau_1985_Xsection_CSV_Database.tar.gz'
filepath = 'Yeh_Lindau_1985_Xsection_CSV_Database/{element1}.csv'

filepath = filepath.format(element1 = element)
data = read_csv_file(filename,filepath)

cross_sections = _cross_sections_from_csv_data(energy,data,reference)
result[element] = cross_sections

return result




import tarfile
import numpy as np


def read_csv_file(tar_file_name, file_path):
"""
Args:
tar_file_name (str): path to tarfile of CSV data
file_path(str): path to individual CSV file within tarfile

Returns:
dict: containing 'headers', 'electron_counts'
(lists of str and int respectively) and 'data_table',
a 2-D nested list of floats. Missing data is represented as None.

"""

# Open zipfile
with tarfile.open(tar_file_name) as tf:
with tf.extractfile(file_path) as hello:
# get data as string
data = hello.read().decode()
# string to list
data_string = data.split('\r\n')

# get number of colunm headers
colunm_headers = [i for i in data_string[0].split(',') if i != '']
lenth = len(colunm_headers)

# build main matrix
main_matrix = []
rows = range(len(data_string))
for row in rows:
data_each_row = data_string[row].split(',')[0:lenth]
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
main_matrix.append(data_each_row)

# build cross sections table
empty_value = main_matrix[-2]
# remove empty values
midterm = [i for i in main_matrix if i != empty_value]
new_main_matrix = midterm[0:-2]

# build result dict
result_dict = {}
result_dict['headers'] = colunm_headers
result_dict['electron_counts'] = [i for i in midterm[-2] if i != ''][1:]
result_dict['data_table'] = new_main_matrix[1:]

return result_dict


def _cross_sections_from_csv_data(energy, data, reference):
"""
Args:
energy(float): energy value
data(dict): data from read_csv_file()
reference(str): 'Scofield' or 'Yeh'

Note: 1.'Scofield' for J. H. Scofield (1973)
Lawrence Livermore National Laboratory Report No. UCRL-51326
2.'Yeh' for Yeh, J.J. and Lindau, I. (1985)
Atomic Data and Nuclear Data Tables 32 pp 1-155

Returns:
orbitals_cross_sections_dict: containing orbitals 's', 'p', 'd', 'f' and f
cross sections of each orbital.
Missing data is represented as None.

"""

# replace '' in data table with NaN
for row in range(len(data['data_table'])):
data['data_table'][row] = [
float('NaN') if x == '' else x for x in data['data_table'][row]]

# change the data_table and electron_counts to float arrays
data['data_table'] = np.array(data['data_table']).astype(float)
data['electron_counts'] = np.array(data['electron_counts']).astype(float)

## Build a new_dic which keys are like '1s1/2', '2s1/2', '2p1/2', '2p3/2', '3s1/2', '3p1/2', '3p3/2'...
## and values are connected cross sections and number of electrons of each orbital
## This is for calculating the max cross sections of 's', 'p', 'd', 'f' orbitals
new_dic = {}
orbitals_number = len(data['electron_counts'])
# connect the number of electron_counts to each orbitals and cross sections
new_value = np.concatenate(
(data['data_table'].T[-orbitals_number:].T, [data['electron_counts']]), axis=0).T
for orbital in range(orbitals_number):
new_key = data['headers'][-orbitals_number:][orbital]
new_dic[new_key] = new_value[orbital]

# add energy array to new_dic
energy_array = np.array(data['data_table']).T[0]
new_dic['PhotonEnergy'] = energy_array

# match the import energy
index = np.where(new_dic['PhotonEnergy'] == energy)[0][0]

# build result dict
orbitals_cross_sections_dict = {}

# result for s orbital
s_cross_sections = np.array([new_dic[key]
for key in new_dic if 's' in key]).T[index]
electrons_number = np.array([new_dic[key]
for key in new_dic if 's' in key]).T[-1]
# get unit cross sections
unit_cross_sections = np.true_divide(s_cross_sections, electrons_number)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
# get max cross section of obital s
max_cross_section = np.max(np.nan_to_num(unit_cross_sections))
orbitals_cross_sections_dict['s'] = max_cross_section
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

# result for 'p', 'd', 'f' orbitals
orbitals = ['p', 'd', 'f']
for orbital in orbitals:
interm_matrix = np.array([new_dic[key]
for key in new_dic if orbital in key])
if np.shape(interm_matrix) != (0,):
if reference == 'Scofield':
obital_cross_sections = interm_matrix.T[index]
electrons_number = interm_matrix.T[-1]
unit_cross_sections = np.true_divide(
obital_cross_sections, electrons_number)

# for orbitals like '2p1/2', '2p3/2' we need to calculate electrons number weighted mean value as result cross_section
unit_cross_sections_array = np.array(
[unit_cross_sections[i:i+2] for i in range(0, len(unit_cross_sections), 2)])
weight = np.array([np.true_divide(obital_cross_sections[i:i+2], obital_cross_sections[i:i+2].sum())
for i in range(0, len(obital_cross_sections), 2)])
result = np.array(
list(map(sum, unit_cross_sections_array*weight)))
# get max cross section of this obital
max_cross_section = np.max(np.nan_to_num(result))
orbitals_cross_sections_dict[orbital] = max_cross_section

elif reference == 'Yeh':
obital_cross_sections = interm_matrix.T[index]
electrons_number = interm_matrix.T[-1]
unit_cross_sections = np.true_divide(
obital_cross_sections, electrons_number)
# get max cross section of this obital
max_cross_section = np.max(np.nan_to_num(unit_cross_sections))
orbitals_cross_sections_dict[orbital] = max_cross_section
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

return orbitals_cross_sections_dict


def get_metadata(energy, reference):
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
"""
Args:
energy(float): energy value
reference(str): 'Scofield' or 'Yeh'

Note: 1.'Scofield' for J. H. Scofield (1973)
Lawrence Livermore National Laboratory Report No. UCRL-51326
2.'Yeh' for Yeh, J.J. and Lindau, I. (1985)
Atomic Data and Nuclear Data Tables 32 pp 1-155

Returns:
metadata_dict: containing the input energy value
and description of input reference

"""

metadata_dict = {}
metadata_dict['energy'] = energy
if reference == 'Scofield':
metadata_dict['reference'] = 'J.H. Scofield, Theoretical photoionization cross sections from 1 to 1500 keV'
metadata_dict['link'] = 'https://doi.org/10.2172/4545040'
elif reference == 'Yeh':
metadata_dict['reference'] = 'Yeh, J.J. and Lindau, I. (1985) Atomic Data and Nuclear Data Tables 32 pp 1-155'
metadata_dict['link'] = 'https://doi.org/10.1016/0092-640X(85)90016-6'
else:
metadata_dict('Wrong reference')
return metadata_dict


def get_cross_section_from_csv(elements, energy, reference):
"""
Args:
elements(string list): element name list
for Scofiled data such as ['Z__1_H_','Z_13_Al',....]
for Yeh data such as ['1_H','13_Al',...]

energy(float): energy value
reference(str): 'Scofield' or 'Yeh'

Note: 1.'Scofield' for J. H. Scofield (1973)
Lawrence Livermore National Laboratory Report No. UCRL-51326
2.'Yeh' for Yeh, J.J. and Lindau, I. (1985)
Atomic Data and Nuclear Data Tables 32 pp 1-155

Returns:
result(dict): containing energy value, reference information,
and orbital cross sections dict of input elements

"""

result = {}
metadata = get_metadata(energy, reference)
result.update(metadata)

for element in elements:

if reference == 'Scofield':
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
filename = 'Scofield_csv_database.tar.gz'
filepath = 'Scofield_csv_database/{element1}.csv'
else:
filename = 'Yeh_Lindau_1985_Xsection_CSV_Database.tar.gz'
filepath = 'Yeh_Lindau_1985_Xsection_CSV_Database/{element1}.csv'

filepath = filepath.format(element1=element)
data = read_csv_file(filename, filepath)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

cross_sections = _cross_sections_from_csv_data(energy, data, reference)
result[element] = cross_sections
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

return result
Loading