Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document arrow-rs architecture and structure #4071

Open
alamb opened this issue Apr 12, 2023 · 5 comments
Open

Document arrow-rs architecture and structure #4071

alamb opened this issue Apr 12, 2023 · 5 comments
Labels
documentation Improvements or additions to documentation enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Apr 12, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The underlying implementation of arrow arrays has changed significantly due to the work in #3880

There are now several important classes such as ScalarBuffer, NullBuffer, Buffer PrimitiveArray, etc that underlying the arrays in addition to the "classic" ArrayData. In addition after #3879 is complete I believe these types will be more publicly exposed through the various Arrow APIs.

Describe the solution you'd like
I would like documentation / diagrams / something that briefly explains the key structures ad how they are related to each other.

Perhaps we can take inspiration (or copy/modify) the wonderful guide that @jorgecarleitao wrote for arrow2: https://jorgecarleitao.github.io/arrow2/main/guide/

Describe alternatives you've considered

Additional context
See details on #4061 (comment)

@alamb alamb added documentation Improvements or additions to documentation enhancement Any new improvement worthy of a entry in the changelog labels Apr 12, 2023
@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2023

cc @tustvold

tustvold added a commit to tustvold/arrow-rs that referenced this issue May 10, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue May 10, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue May 10, 2023
tustvold added a commit that referenced this issue May 11, 2023
* Update docs (#4071)

* Review feedback
@xxchan
Copy link
Contributor

xxchan commented Jun 5, 2023

It would be helpful to have such a doc 👀. Specifically I feel a little confused about "buffer", "data" (is it still useful after recent refactoring?) and "array", and not sure what to use. arrow-2's description about "low-level"/"high-level" API is easier to understantd 😄 .

@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2023

https://docs.rs/arrow/latest/arrow/#columnar-format and by extension the linked https://docs.rs/arrow-array/40.0.0/arrow_array/index.html I believe is such a doc, but please let me know if anything isn't clear or could do with additional clarification

@xxchan
Copy link
Contributor

xxchan commented Jun 5, 2023

Thanks. The docs are indeed very helpful! However, my user journey is like:

Wow, there are so many sub-crates! And ... what on earth do they mean? (So I didn't browse the introductory content below at the beginning..)

image

Well, let me go to arrow-buffer / arrow-data and check it. Oops, there aren't many useful infomation either.. What is "Buffer abstraction"?

image

So it seems good doc exists, but are not accessible enough.

Ways to improve I can come up with now:

  • Better "introduction" section in crate-level docs.
    • root arrow crate can group sub-crates (e.g., low-level/high-level/compute kernal, like the arrow2's doc) and add more description, instead of simply enumerating them.
    • sub-crate may need some more explanations (at least mention that refer to some struct's doc for more details)
  • It seems high-level API (Array) explains low-level API (Buffer), but not the reverse. Maybe a backlink can be added to explain the relationship.

@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2023

Thank you that is very useful feedback, I'll try to get something to address this up in the coming days 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants