[WIP] Structuring 3 #760

ardunn · 2023-03-02T05:29:37Z

Structuring 3

A rewriting of the structuring class to make BEEP structuring more flexible, efficient, and idiomatic.

main contributions

Pros and Cons vs. previous structuring `BEEPDatapath`, and older `*CycleRun` framework

Pros

Modular, idiomatic objects representing cycler runs, cycles, and steps.
Code is organized, modular, and separated by function.
Much easier to work with subsets of data
Hugely improved configuration for interpolation. Each set of cycles, cycle, and step can be configured to be interpolated individually, in groups, or all at once. Configuration can be specified easily, powerfully, and in readable fashion.
Capacity for parallelization and dask-based memory management, since each cycle can be interpolated independently. Testing on single 4-core laptop shows ~10-50% reduction in compute time.
No more nan issue. Everything is actually interpolated.
Retains from_file functionality to load many kinds of cycler data
Improved clarity of core column headers (e.g., no more step_type and wondering what that actually means)
Can use as much memory as specified because all the "big" objects are stored in dask Bags (both structured and unstructured data). To specify an amount of memory just use dask.distributed to declare a local client cluster beforehand and use the code as is - or you can configure it to run on a cluster. I've been able to get the memory usage down to about 1/3 it was with BEEPDatapath.
Tests are organized sanely, rather than insanely
Is backwards compatible with both legacy ProcessedCyclerRun and BEEPDatapath objects previously structured and saved to disc.
Enables eventually being able to programmatically represent protocols with run, cycle, and step granularity.
gzipped files saved to disk, including structured data and summaries, are actually (usually) smaller than the original uncompressed raw cycler data. Typically on the order of a 20% reduction in size

Cons

Comparatively slow when the number of points in a single cycle or step is low, eg if there are 10000 steps in a cycle and each step has 2 points.
Takes longer to obtain views of data because dataframes from steps/cycles are collated together. Typically ~1-2s maximum on laptop
Takes longer to instantiate run objects because all Step/Cycle objects have to be instantiated. Typically ~30s on laptop
The implementation of Step can cause strange behavior when grouping multiple steps together (i.e., when steps must be grouped together to interpolated based on chg/dchg step labels.)

problems:

When interpolating on step label (e.g., charge and discharge) on axes for which there are multiple steps labels discontinuous on the axis, weird stuff can happen. Examples of when this occurs:
- One charge step, one discharge step: no problem.
- $k$ charge steps, $k$ discharge steps: generally, no problem
- a charge step, a discharge step, a charge step: interpolation is wonky
Indeterminate steps are interpolated by default
Validation currently requires the entire df be loaded into memory at once, but this can in principle be fixed
Cannot yet index steps or collections by raw index, e.g. run.raw.cycles[-1] does not work, but run.raw.cycles[1] will give you the cycle with cycle_index=1.
Indexing chained selections can be wonky, ie run.raw.cycles["regular"][3] won't give you the 3rd regular cycle, it will give you the regular cycle with cycle index == 3. Can be fixed but need to determine what kind of behavior we actually want. For the time being, there is a method by_raw_index to retrieve various single objects by their raw index
Indexing can sometimes result in a single object (cycle or step) in unexpected ways when you might be expecting an iterable.
Validation will still fail if cycle_index is not monotonically increasing, but this no longer matters for interpolation.

other notes

step_index standard column has been renamed to step_code
all data includes step_counter calculated from step_code
EIS functionality is moved to a different directory for now, since it is incomplete

Examples: TBA

…trs can be accessed for single items

…level and Cycle-level

…naming

…dd legacy conversions

…is field usually signifies (it is generally not an index)

…tep, MultiStep, and Cycle

ardunn added 15 commits March 1, 2023 18:38

initial draft for structure v3

8a8274c

adding some bugfixes and nice-to-haves in early draft of structuringv3

69090d3

various basic improvements to __repr__s and functionality

054130a

double structure

66b13c8

allow for single items to be returned thru DFSelectorAggregator so at…

f498f77

…trs can be accessed for single items

allow for single items to be returned thru DFSelectorAggregator so at…

8e8f0ba

…trs can be accessed for single items

update structuring config to be more sane

55094cb

update config setting for cycles

148f413

update config setting for cycles

b3d811e

[skip ci]

a89bff8

renmove multiple ways of setting configs, separate configs into Step-…

21868fb

…level and Cycle-level

working but untested sane methods for both paradigms of interpolation

ea3c42e

removal of update_nested as it is no longer needed

97d320e

add a basic assignment for DiagnosticConfig in Run, untested

759bf35

remove unneeded kwargs from Diagnostic config [skip ci]

7230de8

ardunn self-assigned this Apr 23, 2023

ardunn added structuring priority labels Apr 23, 2023

ardunn added 12 commits April 22, 2023 19:05

fix problem with empty multistep dataframes

06497f2

[skip ci]

6c024f2

enable dask bag for all cyclecontainers

2edf0e8

update Run to include dask bags inside cyclecontainers

8e785a6

[skip ci]

fe1bc3e

separate modules

327e09a

[skip ci]

4776a19

creating as_dict/from_dict for Monty compatibility

6cc0c49

[skip ci]

c963f90

move validate

95390bb

working diagnostic setting and usage via cycle_label

34eb25a

update diagnostic and convert Maccor to Run

2eeac32

ardunn added 19 commits May 16, 2023 22:43

working basic validation and diagnostic w/o tests

57c3424

clean up maccor

1fc80f5

implementing basic tests

36a9f73

memory friendly summary method for run

8394f8c

include diagnostic summary in run

dd2408d

slight refactoring, fixing by_raw_index

437399d

[skip ci] fix iterable indexing field

a7161f3

[skip ci] working serialization with diagnostic

2ed43fe

add cycles_to_capacities (and inverse), with better presentation and …

82fbc95

…naming

update run as_dict

4df5a7d

make DFSelectorAggregator more idiomatic behavior

07ca92d

Fix some bugs with cycles_container, move configs to constants, and a…

182d455

…dd legacy conversions

various bugfizxes

9e215ae

update constants

60800e3

update util

333ca88

refactor step_index -> step_code, which is more reflective of what th…

a616b6c

…is field usually signifies (it is generally not an index)

add step test files

b919b14

update test_step, add new multistep file

c51f7cd

[skip ci] complete test_step, move .uniques to class attributes for S…

6a28ff2

…tep, MultiStep, and Cycle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Structuring 3 #760

[WIP] Structuring 3 #760

ardunn commented Mar 2, 2023 •

edited

Loading

[WIP] Structuring 3 #760

Are you sure you want to change the base?

[WIP] Structuring 3 #760

Conversation

ardunn commented Mar 2, 2023 • edited Loading

Structuring 3

main contributions

Pros and Cons vs. previous structuring BEEPDatapath, and older *CycleRun framework

Pros

Cons

problems:

other notes

Examples: TBA

ardunn commented Mar 2, 2023 •

edited

Loading

Pros and Cons vs. previous structuring `BEEPDatapath`, and older `*CycleRun` framework