-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for filterbank files? #84
Comments
@wfarah Not using rawspec to generate .h5 files directly? |
@wfarah you want from hyperseti.io import from_fil |
(I'll update docs!) |
You can also use def load_data(filename: str) -> DataArray:
""" Load file and return data array
Args:
filename (str): Name of file to load.
Returns:
ds (DataArray): DataArray object
Notes:
This method also supports input as a setigen Frame,
or as an existing DataArray.
"""
if isinstance(filename, DataArray):
ds = filename # User has actually supplied a DataArray
elif isinstance(filename, str):
if h5py.is_hdf5(filename):
ds = from_h5(filename)
elif sigproc.is_filterbank(filename):
ds = from_fil(filename)
else:
raise RuntimeError("Only HDF5/filterbank files or DataArray/setigen.Frame objects are currently supported")
elif isinstance(filename, stg.Frame):
ds = from_setigen(filename)
else:
raise RuntimeError("Only HDF5/filterbank files or DataArray/setigen.Frame objects are currently supported")
return ds |
I should also document how to use the In terms of working with Filterbank files, not too much to note, except that it's different to blimpy. The data are loaded using You can see the source code here, it's refreshingly short: The Let me know how you go! |
PS: Here's a script I'm running on import glob, os, pprint
from hyperseti import find_et
from hyperseti.io import load_config
from hyperseti.io.hit_db import HitDatabase
config_file = 'pksmb.yaml' # Pipeline config file to load
hit_db = 'pks_mb.hitdb' # Output HitDatabase name
fpath = '/datag/collate_mb/PKS_0240_2017-12-21T19:00/'
hires_ext = '*.0000.hires.hdf' # Extension to use to ID hi-freq res data
gpu_id = 3 # GPU to attach to (blpc nodes can be busy!)
# Load config and glob folder list
blc_folders = sorted(glob.glob(os.path.join(fpath, 'blc*')))
config = load_config(config_file)
pprint.pprint(config)
# Create new hit database
db = HitDatabase(hit_db, mode='w')
# Loop through blcXX folders, and search for hires H5 files
# (these data are older )
for ff, folder in enumerate(blc_folders):
filelist = glob.glob(os.path.join(folder, hires_ext))
for ii, filename in enumerate(filelist):
print(f"(node {ff} / {len(blc_folders)}: file {ii + 1}/{len(filelist)}) Opening {filename}...")
# A slightly more friendly observation ID, prepending blcXX to ensure ID is unique
blc_id = os.path.basename(folder)
obs_id = os.path.basename(filename)
obs_id = obs_id.replace('guppi_', '').replace(hires_ext,'')
obs_id = f'{blc_id}_{obs_id}'
# Search for hits, file by file. Do not save to file here as we save to HitDatabase
hit_browser = find_et(filename, config,
gulp_size=2**20,
gpu_id=3,
filename_out=None, log_output=False, log_config=False
)
# Save to the HitDatabase
print(f"Saving to obs_id: {obs_id}")
hit_browser.to_db(db, obs_id=obs_id) And the config file: preprocess:
blank_edges:
n_chan: 1024
normalize: true
sk_flag:
n_sigma: 10
blank_extrema:
threshold: 10000
poly_fit: 5
dedoppler:
apply_smearing_corr: true
kernel: ddsk
max_dd: 100.0
min_dd: null
plan: stepped
hitsearch:
min_fdistance: null
threshold: 20
pipeline:
merge_boxcar_trials: true
n_boxcar: 1
n_blank: 8 |
We have a real-time beamformer/upchannelizer that we usually utilize, and we are currently writing out In the meantime, I think we can use the above example, so thanks for providing it! |
I have tried to run the following:
|
@mirosaide you may have guessed it, but you ran out of memory on the GPU! You'll need to either 1) free up memory on the GPU if other things are running 2) narrow your doppler drift range down (maybe try +/- 1 Hz/s and see if it fits on?), or 3) choose a smaller gulp size (2**18). BTW, iterative blanking is faster now, and I would suggest using it if you're searching out to +/- 10 Hz. This config goes into 'pipeline': {
'merge_boxcar_trials': True,
'blank_hits': {'n_blank': 4, 'padding': 16}
} |
I tried again with:
Now getting:
|
Very impressive it somehow managed to flag 200% of the data! (I'm not sure how that's possible) My guess is that the data do not have gaussian statistics, so it's all getting flagged. Preprocess STD = 0.0 will convert all data into NaN values, and then it's game over downstream. You may be able to continue by setting |
|
Set
|
I'd have to look to debug this one. Can you either send me the file (e.g. upload to blpd) or let me know where you're working? (I don't have login access to ATA though) |
Hello, |
@mirosaide That url has a glitch. |
The current output of the ATA are files in filterbank
.fil
format.Is there scope for adding support for filterbank files input to
hyperseti
?The text was updated successfully, but these errors were encountered: