Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyx datatypes #1357

Merged
merged 8 commits into from
Dec 7, 2023
Merged

Pyx datatypes #1357

merged 8 commits into from
Dec 7, 2023

Conversation

mkolopanis
Copy link
Member

@mkolopanis mkolopanis commented Nov 7, 2023

switches to using unsigned 64 bit integers for antenna and baseline numbers in the cython extensions.
Updates the rectangularity calculation to be compatible with unsigned integers.

Forces casting some modulus as c types in the cython extension and uses c division with those numbers (otherwise python tries to cast them as python ints anyway). I expected this would help performance for large numbered baseline (>2**22 + 2**16, see #1354) but it actually seems like it doesn't. Regardless this is more consistent with the other implementations in the extension.

closes #1353

Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #1357 (fcdc2d9) into main (0b98ea0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1357   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files          36       36           
  Lines       20228    20230    +2     
=======================================
+ Hits        20212    20214    +2     
  Misses         16       16           
Files Coverage Δ
pyuvdata/utils.py 100.00% <100.00%> (ø)
pyuvdata/utils.pyx 100.00% <100.00%> (ø)
pyuvdata/uvcal/initializers.py 100.00% <100.00%> (ø)
pyuvdata/uvdata/uvdata.py 100.00% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b98ea0...fcdc2d9. Read the comment docs.

@mkolopanis
Copy link
Member Author

annotated cython output on the changes. says it should be just c
image

steven-murray
steven-murray previously approved these changes Nov 13, 2023
Copy link
Contributor

@steven-murray steven-murray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can quite speak to all the Cython bits, but it looks reasonable. The rectangularity part looks good.

@bhazelton
Copy link
Member

It looks like there are some hera_cal errors popping up in check calls about ant_array (for UVCal) and ant_1_array (for UVData) being float64 rather than int.

@mkolopanis
Copy link
Member Author

oh wow, this is all in an innit loop........ there's going to need to be more looking into it. But the changes here don't seem like they should be causing this.

@mkolopanis
Copy link
Member Author

I think I found what is happening. I actually wonder if this is a numpy bug. When you call intersect1d on two arrays, one is uint64 the other is int64, you get a return type of float64. Happens in uvcal initializers

if antenna_numbers is not None:
ant_array = np.intersect1d(ant_array, antenna_numbers)
elif isinstance(antenna_positions, dict):
ant_array = np.intersect1d(ant_array, list(antenna_positions.keys()))

In [2]: import numpy as np

In [3]: a1 = np.arange(12, dtype=np.uint64)

In [4]: a2 = np.arange(5,10)

In [5]: a2.dtype
Out[5]: dtype('int64')

In [6]: a1.dtype
Out[6]: dtype('uint64')

In [7]: a3 = np.intersect1d(a1,a2)

In [8]: a3.dtype
Out[8]: dtype('float64')

In [9]: 

@mkolopanis
Copy link
Member Author

probably what numpy would say is the more generic form, a superset of numbers that can deal with both types.

@mkolopanis
Copy link
Member Author

I cannot figure out why the second error is happening though.
in hera_cal's lst bin tests there's some antpairs that are int64 and one set that is uint64.. so when it casts the antpairs to an array we get floats again

like i added this in line 2756 in hera_cal/io.py

    print(f"{data._antpairs=:} {[type(a) for ap in data._antpairs for a in ap]}")

and get

antpairs=[(1, 1), (3, 3), (14, 14), (36, 36), (52, 100), (98, 143), (102, 144), (103, 176), (127, 162), (135, 166), (140, 158), (164, 191)]
[<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>]

but can't figure out why there are different types

@mkolopanis
Copy link
Member Author

just found the tracking bug for this numpy/numpy#20905

@mkolopanis
Copy link
Member Author

This last error seems to be from iterative operations on lst_bin in hera_cal. @steven-murray has disclosed this function is deprecated and will be removed in the future for hera_cal. I can propose a downstream workaround for hera_cal to fix this in their code but can't figure out the cause.

@steven-murray
Copy link
Contributor

@mkolopanis thanks for that. What's your proposed workaround in the meantime so we can get tests running?

@mkolopanis
Copy link
Member Author

@mkolopanis thanks for that. What's your proposed workaround in the meantime so we can get tests running?

here's a patch
diff --git a/hera_cal/utils.py b/hera_cal/utils.py
index 8ab68539..5a96b1aa 100644
--- a/hera_cal/utils.py
+++ b/hera_cal/utils.py
@@ -126,12 +126,12 @@ def comply_pol(pol):
 def split_bl(bl):
     '''Splits a (i,j,pol) baseline key into ((i,pi),(j,pj)), where pol=pi+pj.'''
     pi, pj = split_pol(bl[2])
-    return ((bl[0], pi), (bl[1], pj))
+    return ((np.uint64(bl[0]), pi), (np.uint64(bl[1]), pj))
 
 
 def join_bl(ai, aj):
     '''Joins two (i,pi) antenna keys to make a (i,j,pol) baseline key.'''
-    return (ai[0], aj[0], join_pol(ai[1], aj[1]))
+    return (np.uint64(ai[0]), np.uint64(aj[0]), join_pol(ai[1], aj[1]))
 
 
 def reverse_bl(bl):
@@ -141,7 +141,7 @@ def reverse_bl(bl):
     if len(bl) == 2:
         return (j, i)
     else:
-        return (j, i, conj_pol(_comply_vispol(bl[2])))
+        return (np.uint64(j), np.uint64(i), conj_pol(_comply_vispol(bl[2])))
 
 
 def comply_bl(bl):
@@ -151,7 +151,7 @@ def comply_bl(bl):
         return bl
     else:
         i, j, p = bl
-        return (i, j, _comply_vispol(p))
+        return (np.uint64(i), np.uint64(j), _comply_vispol(p))
 
 
 def make_bl(*args):
@@ -163,7 +163,7 @@ def make_bl(*args):
         (i, j), pol = args
     else:
         i, j, pol = args
-    return (i, j, _comply_vispol(pol))
+    return (np.uint64(i), np.uint64(j), _comply_vispol(pol))
 
 
 def filter_bls(bls, ants=None, ex_ants=None, pols=None, antpos=None, min_bl_cut=None, max_bl_cut=None):
@@ -189,16 +189,17 @@ def filter_bls(bls, ants=None, ex_ants=None, pols=None, antpos=None, min_bl_cut=
 
     for bl in bls:
         ant1, ant2 = split_bl(bl)
+
         # filter on antennas to keep
-        if (ants is not None) and (ant1 not in ants) and (ant1[0] not in ants):
+        if (ants is not None) and (ant1 not in ants) and (ant1[0].item() not in ants):
             continue
-        if (ants is not None) and (ant2 not in ants) and (ant2[0] not in ants):
+        if (ants is not None) and (ant2 not in ants) and (ant2[0].item() not in ants):
             continue
 
         # filter on antennas to exclude
-        if (ex_ants is not None) and ((ant1 in ex_ants) or (ant1[0] in ex_ants)):
+        if (ex_ants is not None) and ((ant1 in ex_ants) or (ant1[0].item() in ex_ants)):
             continue
-        if (ex_ants is not None) and ((ant2 in ex_ants) or (ant2[0] in ex_ants)):
+        if (ex_ants is not None) and ((ant2 in ex_ants) or (ant2[0].item() in ex_ants)):
             continue
 
         # filter on polarizations

setup.cfg Outdated Show resolved Hide resolved
bhazelton
bhazelton previously approved these changes Dec 6, 2023
Copy link
Member

@bhazelton bhazelton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Waiting to hear if @steven-murray has any concerns.

@steven-murray
Copy link
Contributor

No concerns, but I would like to apply the patch @mkolopanis recommended to hera_cal to see if it comes right

@bhazelton bhazelton merged commit c643bfd into main Dec 7, 2023
53 checks passed
@bhazelton bhazelton deleted the pyx_datatypes branch December 7, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unrealistic variable choice in utils extension
3 participants