Pyx datatypes #1357

mkolopanis · 2023-11-07T18:16:29Z

switches to using unsigned 64 bit integers for antenna and baseline numbers in the cython extensions.
Updates the rectangularity calculation to be compatible with unsigned integers.

Forces casting some modulus as c types in the cython extension and uses c division with those numbers (otherwise python tries to cast them as python ints anyway). I expected this would help performance for large numbered baseline (>2**22 + 2**16, see #1354) but it actually seems like it doesn't. Regardless this is more consistent with the other implementations in the extension.

closes #1353

codecov · 2023-11-07T18:17:41Z

Codecov Report

Merging #1357 (fcdc2d9) into main (0b98ea0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1357   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files          36       36           
  Lines       20228    20230    +2     
=======================================
+ Hits        20212    20214    +2     
  Misses         16       16

Files	Coverage Δ
pyuvdata/utils.py	`100.00% <100.00%> (ø)`
pyuvdata/utils.pyx	`100.00% <100.00%> (ø)`
pyuvdata/uvcal/initializers.py	`100.00% <100.00%> (ø)`
pyuvdata/uvdata/uvdata.py	`100.00% <100.00%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b98ea0...fcdc2d9. Read the comment docs.

mkolopanis · 2023-11-07T18:19:56Z

annotated cython output on the changes. says it should be just c

steven-murray

Not sure I can quite speak to all the Cython bits, but it looks reasonable. The rectangularity part looks good.

bhazelton · 2023-11-14T00:02:27Z

It looks like there are some hera_cal errors popping up in check calls about ant_array (for UVCal) and ant_1_array (for UVData) being float64 rather than int.

mkolopanis · 2023-11-14T16:39:06Z

oh wow, this is all in an innit loop........ there's going to need to be more looking into it. But the changes here don't seem like they should be causing this.

mkolopanis · 2023-11-14T17:38:16Z

I think I found what is happening. I actually wonder if this is a numpy bug. When you call intersect1d on two arrays, one is uint64 the other is int64, you get a return type of float64. Happens in uvcal initializers

pyuvdata/pyuvdata/uvcal/initializers.py

Lines 481 to 484 in bd03009

    
           if antenna_numbers is not None: 
        
               ant_array = np.intersect1d(ant_array, antenna_numbers) 
        
           elif isinstance(antenna_positions, dict): 
        
               ant_array = np.intersect1d(ant_array, list(antenna_positions.keys()))

In [2]: import numpy as np

In [3]: a1 = np.arange(12, dtype=np.uint64)

In [4]: a2 = np.arange(5,10)

In [5]: a2.dtype
Out[5]: dtype('int64')

In [6]: a1.dtype
Out[6]: dtype('uint64')

In [7]: a3 = np.intersect1d(a1,a2)

In [8]: a3.dtype
Out[8]: dtype('float64')

In [9]:

mkolopanis · 2023-11-14T17:40:23Z

probably what numpy would say is the more generic form, a superset of numbers that can deal with both types.

mkolopanis · 2023-11-14T18:33:02Z

I cannot figure out why the second error is happening though.
in hera_cal's lst bin tests there's some antpairs that are int64 and one set that is uint64.. so when it casts the antpairs to an array we get floats again

like i added this in line 2756 in hera_cal/io.py

    print(f"{data._antpairs=:} {[type(a) for ap in data._antpairs for a in ap]}")

and get

antpairs=[(1, 1), (3, 3), (14, 14), (36, 36), (52, 100), (98, 143), (102, 144), (103, 176), (127, 162), (135, 166), (140, 158), (164, 191)]
[<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.uint64'>, <class 'numpy.uint64'>]

but can't figure out why there are different types

mkolopanis · 2023-11-16T21:28:30Z

just found the tracking bug for this numpy/numpy#20905

mkolopanis · 2023-11-27T22:29:20Z

This last error seems to be from iterative operations on lst_bin in hera_cal. @steven-murray has disclosed this function is deprecated and will be removed in the future for hera_cal. I can propose a downstream workaround for hera_cal to fix this in their code but can't figure out the cause.

steven-murray · 2023-11-28T10:48:43Z

@mkolopanis thanks for that. What's your proposed workaround in the meantime so we can get tests running?

mkolopanis · 2023-11-28T18:23:54Z

@mkolopanis thanks for that. What's your proposed workaround in the meantime so we can get tests running?

here's a patch

diff --git a/hera_cal/utils.py b/hera_cal/utils.py
index 8ab68539..5a96b1aa 100644
--- a/hera_cal/utils.py
+++ b/hera_cal/utils.py
@@ -126,12 +126,12 @@ def comply_pol(pol):
 def split_bl(bl):
     '''Splits a (i,j,pol) baseline key into ((i,pi),(j,pj)), where pol=pi+pj.'''
     pi, pj = split_pol(bl[2])
-    return ((bl[0], pi), (bl[1], pj))
+    return ((np.uint64(bl[0]), pi), (np.uint64(bl[1]), pj))
 
 
 def join_bl(ai, aj):
     '''Joins two (i,pi) antenna keys to make a (i,j,pol) baseline key.'''
-    return (ai[0], aj[0], join_pol(ai[1], aj[1]))
+    return (np.uint64(ai[0]), np.uint64(aj[0]), join_pol(ai[1], aj[1]))
 
 
 def reverse_bl(bl):
@@ -141,7 +141,7 @@ def reverse_bl(bl):
     if len(bl) == 2:
         return (j, i)
     else:
-        return (j, i, conj_pol(_comply_vispol(bl[2])))
+        return (np.uint64(j), np.uint64(i), conj_pol(_comply_vispol(bl[2])))
 
 
 def comply_bl(bl):
@@ -151,7 +151,7 @@ def comply_bl(bl):
         return bl
     else:
         i, j, p = bl
-        return (i, j, _comply_vispol(p))
+        return (np.uint64(i), np.uint64(j), _comply_vispol(p))
 
 
 def make_bl(*args):
@@ -163,7 +163,7 @@ def make_bl(*args):
         (i, j), pol = args
     else:
         i, j, pol = args
-    return (i, j, _comply_vispol(pol))
+    return (np.uint64(i), np.uint64(j), _comply_vispol(pol))
 
 
 def filter_bls(bls, ants=None, ex_ants=None, pols=None, antpos=None, min_bl_cut=None, max_bl_cut=None):
@@ -189,16 +189,17 @@ def filter_bls(bls, ants=None, ex_ants=None, pols=None, antpos=None, min_bl_cut=
 
     for bl in bls:
         ant1, ant2 = split_bl(bl)
+
         # filter on antennas to keep
-        if (ants is not None) and (ant1 not in ants) and (ant1[0] not in ants):
+        if (ants is not None) and (ant1 not in ants) and (ant1[0].item() not in ants):
             continue
-        if (ants is not None) and (ant2 not in ants) and (ant2[0] not in ants):
+        if (ants is not None) and (ant2 not in ants) and (ant2[0].item() not in ants):
             continue
 
         # filter on antennas to exclude
-        if (ex_ants is not None) and ((ant1 in ex_ants) or (ant1[0] in ex_ants)):
+        if (ex_ants is not None) and ((ant1 in ex_ants) or (ant1[0].item() in ex_ants)):
             continue
-        if (ex_ants is not None) and ((ant2 in ex_ants) or (ant2[0] in ex_ants)):
+        if (ex_ants is not None) and ((ant2 in ex_ants) or (ant2[0].item() in ex_ants)):
             continue
 
         # filter on polarizations

setup.cfg

bhazelton

Looks good to me. Waiting to hear if @steven-murray has any concerns.

steven-murray · 2023-12-06T07:55:43Z

No concerns, but I would like to apply the patch @mkolopanis recommended to hera_cal to see if it comes right

mkolopanis requested review from steven-murray and plaplant November 7, 2023 18:16

steven-murray previously approved these changes Nov 13, 2023

View reviewed changes

bhazelton added the technical debt label Nov 13, 2023

mkolopanis dismissed steven-murray’s stale review via a501e76 November 14, 2023 18:31

mkolopanis force-pushed the pyx_datatypes branch from a501e76 to 489075e Compare November 27, 2023 22:29

bhazelton reviewed Dec 6, 2023

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

bhazelton previously approved these changes Dec 6, 2023

View reviewed changes

This was referenced Dec 6, 2023

fix: use uint dtype for ant numbers to comply with pyuvdata HERA-Team/hera_cal#925

Closed

fix: enforce that ant numbers are uint64 HERA-Team/hera_cal#926

Merged

mkolopanis added 8 commits December 7, 2023 10:18

change baseline/antnum conversions to use u64

4f6990e

re-add min bl comparison

988e059

check if bls are same, using diff on uints gives over/under flow

d9149ed

define constants for "large" numbers in cython extension

f8cac86

use diff and comparison with rectangularity calc

d701cdd

use c division for large antenna numbers

9901971

force dtypes to match on uvcal inits, otherwise you get floats

66ae749

add a test where ant_array is not a np.array

fcdc2d9

mkolopanis dismissed bhazelton’s stale review via fcdc2d9 December 7, 2023 17:19

mkolopanis force-pushed the pyx_datatypes branch from 4d54e93 to fcdc2d9 Compare December 7, 2023 17:19

bhazelton approved these changes Dec 7, 2023

View reviewed changes

bhazelton merged commit c643bfd into main Dec 7, 2023
53 checks passed

bhazelton deleted the pyx_datatypes branch December 7, 2023 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyx datatypes #1357

Pyx datatypes #1357

mkolopanis commented Nov 7, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading

mkolopanis commented Nov 7, 2023

steven-murray left a comment

bhazelton commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 16, 2023

mkolopanis commented Nov 27, 2023

steven-murray commented Nov 28, 2023

mkolopanis commented Nov 28, 2023

bhazelton left a comment

steven-murray commented Dec 6, 2023

Pyx datatypes #1357

Pyx datatypes #1357

Conversation

mkolopanis commented Nov 7, 2023 • edited Loading

codecov bot commented Nov 7, 2023 • edited Loading

Codecov Report

mkolopanis commented Nov 7, 2023

steven-murray left a comment

Choose a reason for hiding this comment

bhazelton commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 14, 2023

mkolopanis commented Nov 16, 2023

mkolopanis commented Nov 27, 2023

steven-murray commented Nov 28, 2023

mkolopanis commented Nov 28, 2023

bhazelton left a comment

Choose a reason for hiding this comment

steven-murray commented Dec 6, 2023

mkolopanis commented Nov 7, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading