You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But now we no longer make individual Segments in the 'generic' format (that replaced 'csv').
Instead we make entire Sequences using the from_keyword instead, parsing csv files with pandas to avoid the maintenance work of doing it ourselves:
(We grab a sequence at a time from the dataframe, and then get arrays from each column 'onset_s', 'offset_s', etc., via the values attribute.)
I realized Segment.from_row is no longer used as I was trying to write examples for the Segment docstring as suggested here in the pyOpenSci review: pyOpenSci/software-submission#68 (comment)
We should just remove it and do the more intuitive thing with attrs:
use the attrsoptional and have these default to None (NoneType, not a string "None")
then do the __post_init__ thing where we throw an error if at least one of them is not specified
Dumping a more detailed explanation of issues cause by from_row, from my dev notes:
I can't just make an instance by passing in args
It throws an error if I only pass in onset_s / offset_s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Segment.__init__() missing 2 required positional arguments: 'onset_sample' and 'offset_sample'
which is first of all confusing because it looks like these are optional, and I believe that was my intent? From the Segment.__init__ method
But then these converters are doing something extra weird, looking for a string "None":
deffloat_or_None(val):
"""converter that casts val to float, or returns None if val is string 'None'"""ifval=="None":
returnNoneelse:
returnfloat(val)
What's actually going on is that these are doing the work of a Segment.from_row method that can handle a row of strings from a csv. See this unit test for an example:
def test_Segment_init_onset_offset_in_seconds_from_row():
header = ["label", "onset_s", "offset_s", "onset_sample", "offset_sample"]
row = ["a", "0.123", "0.170", "None", "None"]
a_segment = Segment.from_row(row=row, header=header)
for attr in ["label", "onset_s", "offset_s"]:
assert hasattr(a_segment, attr)
for attr in ["onset_sample", "offset_sample"]:
assert getattr(a_segment, attr) is None
The text was updated successfully, but these errors were encountered:
and the leftover logic for it is confusing.
This method was used by the now-removed 'csv' format, see for example here:
crowsetta/src/crowsetta/csv.py
Line 224 in a47798a
But now we no longer make individual
Segment
s in the 'generic' format (that replaced 'csv').Instead we make entire
Sequence
s using thefrom_keyword
instead, parsing csv files withpandas
to avoid the maintenance work of doing it ourselves:crowsetta/src/crowsetta/formats/seq/generic.py
Line 265 in f95b08a
(We grab a sequence at a time from the dataframe, and then get arrays from each column 'onset_s', 'offset_s', etc., via the
values
attribute.)I realized
Segment.from_row
is no longer used as I was trying to write examples for the Segment docstring as suggested here in the pyOpenSci review: pyOpenSci/software-submission#68 (comment)We should just remove it and do the more intuitive thing with attrs:
attrs
optional
and have these default toNone
(NoneType, not a string "None")__post_init__
thing where we throw an error if at least one of them is not specifiedDumping a more detailed explanation of issues cause by
from_row
, from my dev notes:The text was updated successfully, but these errors were encountered: