Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of wfdb.io.annotation.field2bytes function #406

Merged
merged 4 commits into from
Aug 30, 2022

Conversation

Fegalf
Copy link
Contributor

@Fegalf Fegalf commented Jul 4, 2022

Hi,

I noticed writing an annotation file was slow for a file with many annotations.
Running line-profiling on writing functions, I found out that the field2bytes function was taking up most of the execution time.

So, it turns out that the problem was with this line:
typecode = ann_label_table.loc[ann_label_table["symbol"] == value[1], "label_store"].values[0]

What happened was that we filtered through all the ann_label_table DataFrame for every input value of field2bytes, so this was pretty slow. Instead, I added a dictionnary that maps every symbols to its corresponding label, which is much faster (see the time profiler output below)

Time profilers

Current version

Total time: 86.361 s
File: wfdb/io/annotation.py
Function: field2bytes at line 1602

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1602                                           @profile
  1603                                           def field2bytes(field, value):
  1604                                               """
  1605                                               Convert an annotation field into bytes to write.
  1606                                           
  1607                                               Parameters
  1608                                               ----------
  1609                                               field : str
  1610                                                   The annotation field of the value to be converted to bytes.
  1611                                               value : list
  1612                                                   The value to be converted to bytes.
  1613                                           
  1614                                               Returns
  1615                                               -------
  1616                                               data_bytes : list, ndarray
  1617                                                   All of the bytes to be written to the annotation file.
  1618                                           
  1619                                               """
  1620    361156     273292.0      0.8      0.3      data_bytes = []
  1621                                           
  1622                                               # samp and sym bytes come together
  1623    361156     248245.0      0.7      0.3      if field == "samptype":
  1624                                                   # Numerical value encoding annotation symbol
  1625    179612   83467815.0    464.7     96.6          typecode = ann_label_table.loc[ann_label_table["symbol"] == value[1], "label_store"].values[0]
  1626                                                   #typecode = typecodes[value[1]]
  1627                                                   # sample difference
  1628    179612     236106.0      1.3      0.3          sd = value[0]
  1629                                           
  1630    179612     131775.0      0.7      0.2          data_bytes = []
  1631                                           
  1632                                                   # Add SKIP element(s) if the sample difference is too large to
  1633                                                   # be stored in the annotation type word.
  1634                                                   #
  1635                                                   # Each SKIP element consists of three words (6 bytes):
  1636                                                   #  - Bytes 0-1 contain the SKIP indicator (59 << 10)
  1637                                                   #  - Bytes 2-3 contain the high 16 bits of the sample difference
  1638                                                   #  - Bytes 4-5 contain the low 16 bits of the sample difference
  1639                                                   # If the total difference exceeds 2**31 - 1, multiple skips must
  1640                                                   # be used.
  1641    181444     255089.0      1.4      0.3          while sd > 1023:
  1642      1832       3423.0      1.9      0.0              n = min(sd, 0x7FFFFFFF)
  1643      1832        915.0      0.5      0.0              data_bytes += [
  1644      1832        931.0      0.5      0.0                  0,
  1645      1832        916.0      0.5      0.0                  59 << 2,
  1646      1832       2251.0      1.2      0.0                  (n >> 16) & 255,
  1647      1832       1563.0      0.9      0.0                  (n >> 24) & 255,
  1648      1832       1583.0      0.9      0.0                  (n >> 0) & 255,
  1649      1832       2294.0      1.3      0.0                  (n >> 8) & 255,
  1650                                                       ]
  1651      1832       1957.0      1.1      0.0              sd -= n
  1652                                           
  1653                                                   # Annotation type itself is stored as a single word:
  1654                                                   #  - bits 0 to 9 store the sample difference (0 to 1023)
  1655                                                   #  - bits 10 to 15 store the type code
  1656    179612     442489.0      2.5      0.5          data_bytes += [sd & 255, ((sd & 768) >> 8) + 4 * typecode]
  1657                                           
  1658    181544     100423.0      0.6      0.1      elif field == "num":
  1659                                                   # First byte stores num
  1660                                                   # second byte stores 60*4 indicator
  1661                                                   data_bytes = [value, 240]
  1662    181544      95246.0      0.5      0.1      elif field == "subtype":
  1663                                                   # First byte stores subtype
  1664                                                   # second byte stores 61*4 indicator
  1665      1932       1299.0      0.7      0.0          data_bytes = [value, 244]
  1666    179612      95012.0      0.5      0.1      elif field == "chan":
  1667                                                   # First byte stores num
  1668                                                   # second byte stores 62*4 indicator
  1669                                                   data_bytes = [value, 248]
  1670    179612     107277.0      0.6      0.1      elif field == "aux_note":
  1671                                                   # - First byte stores length of aux_note field
  1672                                                   # - Second byte stores 63*4 indicator
  1673                                                   # - Then store the aux_note string characters
  1674    179612     531112.0      3.0      0.6          data_bytes = [len(value), 252] + [ord(i) for i in value]
  1675                                                   # Zero pad odd length aux_note strings
  1676    179612     150545.0      0.8      0.2          if len(value) % 2:
  1677                                                       data_bytes.append(0)
  1678                                           
  1679    361156     209407.0      0.6      0.2      return data_bytes

New version

Total time: 2.40503 s
File: /home/nicolasbg/miniconda3/envs/physionet/lib/python3.7/site-packages/wfdb/io/annotation.py
Function: field2bytes at line 1602

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1602                                           @profile
  1603                                           def field2bytes(field, value):
  1604                                               """
  1605                                               Convert an annotation field into bytes to write.
  1606                                           
  1607                                               Parameters
  1608                                               ----------
  1609                                               field : str
  1610                                                   The annotation field of the value to be converted to bytes.
  1611                                               value : list
  1612                                                   The value to be converted to bytes.
  1613                                           
  1614                                               Returns
  1615                                               -------
  1616                                               data_bytes : list, ndarray
  1617                                                   All of the bytes to be written to the annotation file.
  1618                                           
  1619                                               """
  1620    361156     199665.0      0.6      8.3      data_bytes = []
  1621                                           
  1622                                               # samp and sym bytes come together
  1623    361156     213260.0      0.6      8.9      if field == "samptype":
  1624                                                   # Numerical value encoding annotation symbol
  1625    179612     121482.0      0.7      5.1          typecode = typecodes[value[1]]
  1626                                                   # sample difference
  1627    179612     100643.0      0.6      4.2          sd = value[0]
  1628                                           
  1629    179612      97847.0      0.5      4.1          data_bytes = []
  1630                                           
  1631                                                   # Add SKIP element(s) if the sample difference is too large to
  1632                                                   # be stored in the annotation type word.
  1633                                                   #
  1634                                                   # Each SKIP element consists of three words (6 bytes):
  1635                                                   #  - Bytes 0-1 contain the SKIP indicator (59 << 10)
  1636                                                   #  - Bytes 2-3 contain the high 16 bits of the sample difference
  1637                                                   #  - Bytes 4-5 contain the low 16 bits of the sample difference
  1638                                                   # If the total difference exceeds 2**31 - 1, multiple skips must
  1639                                                   # be used.
  1640    181444     147706.0      0.8      6.1          while sd > 1023:
  1641      1832       2554.0      1.4      0.1              n = min(sd, 0x7FFFFFFF)
  1642      1832        986.0      0.5      0.0              data_bytes += [
  1643      1832        991.0      0.5      0.0                  0,
  1644      1832        968.0      0.5      0.0                  59 << 2,
  1645      1832       1856.0      1.0      0.1                  (n >> 16) & 255,
  1646      1832       1570.0      0.9      0.1                  (n >> 24) & 255,
  1647      1832       1548.0      0.8      0.1                  (n >> 0) & 255,
  1648      1832       2074.0      1.1      0.1                  (n >> 8) & 255,
  1649                                                       ]
  1650      1832       1556.0      0.8      0.1              sd -= n
  1651                                           
  1652                                                   # Annotation type itself is stored as a single word:
  1653                                                   #  - bits 0 to 9 store the sample difference (0 to 1023)
  1654                                                   #  - bits 10 to 15 store the type code
  1655    179612     253318.0      1.4     10.5          data_bytes += [sd & 255, ((sd & 768) >> 8) + 4 * typecode]
  1656                                           
  1657    181544     100786.0      0.6      4.2      elif field == "num":
  1658                                                   # First byte stores num
  1659                                                   # second byte stores 60*4 indicator
  1660                                                   data_bytes = [value, 240]
  1661    181544      99500.0      0.5      4.1      elif field == "subtype":
  1662                                                   # First byte stores subtype
  1663                                                   # second byte stores 61*4 indicator
  1664      1932       1163.0      0.6      0.0          data_bytes = [value, 244]
  1665    179612      98431.0      0.5      4.1      elif field == "chan":
  1666                                                   # First byte stores num
  1667                                                   # second byte stores 62*4 indicator
  1668                                                   data_bytes = [value, 248]
  1669    179612     102374.0      0.6      4.3      elif field == "aux_note":
  1670                                                   # - First byte stores length of aux_note field
  1671                                                   # - Second byte stores 63*4 indicator
  1672                                                   # - Then store the aux_note string characters
  1673    179612     541168.0      3.0     22.5          data_bytes = [len(value), 252] + [ord(i) for i in value]
  1674                                                   # Zero pad odd length aux_note strings
  1675    179612     120971.0      0.7      5.0          if len(value) % 2:
  1676                                                       data_bytes.append(0)
  1677                                           
  1678    361156     192616.0      0.5      8.0      return data_bytes
 

@cx1111 cx1111 merged commit 21a7b52 into MIT-LCP:main Aug 30, 2022
@Fegalf Fegalf deleted the field2bytes-optimization branch December 8, 2022 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants