Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache object code in memory instead of entire module. #4

Open
wants to merge 126 commits into
base: master
Choose a base branch
from

Conversation

augustoasilva
Copy link

No description provided.

Ying Zhou and others added 30 commits May 27, 2021 17:52
Closes apache#10157 from mathyingzhou/ARROW-9299

Lead-authored-by: Ying Zhou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
For serial CSV readers track the absolute row number and report it in errors encountered during parsing or converting.

I did try to get row numbers for the parallel reader but the only way I thought that could work would be to add delimiter counting to the Chunker but that seemed to add more complexity than I wanted to.

Closes apache#10321 from n3world/ARROW-12675-report_rows

Authored-by: Nate Clark <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
…_id's

Questions:

- This is my first PR in the parquet namespace, I'm not sure of all the special rules.
- The field ID generation doesn't happen on the `parquet::schema` -> `arrow::schema` phase but on the `parquet::format::schema` -> `parquet::schema` phase.  So in order to test I had to add `#include "generated/parquet_types.h"` to `arrow_schema_test.cc` and I wasn't sure if I was allowed to reference the `generated/*` files like that.
- This PR simply allows user specified field id's to be persisted.  Is that sufficient for PARQUET-1798 (the title is rather general) or should I open up a dedicated JIRA?

Closes apache#10289 from westonpace/feature/PARQUET-1798-field-id-assignment

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
* Download URL is wrong
* Downloaded packages aren't removed

Closes apache#10418 from kou/release-csharp

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Closes apache#10343 from thisisnic/ARROW-12758_examples

Lead-authored-by: Nic Crane <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
This runs reverse dependency checks using {revdepchecks}. The way that works is by installing a release version of arrow and the current development version (i.e. from the git checkout), and then runs checks on each of the reverse dependencies first with the release (called "old" in {revdepcheck}'s terms) and with the development version ("new" in {revdepcheck}'s terms). Then it compares the outputs and will only fail if there is a failure in the new check that is not in the old check.

I've customized the output a bit so that it prints any errors that come up in either (in the revdepcheck problems step) so we can more easily diagnose, but it will only fail if there are new errors.

One thing that I tried and was unable to do is to find a way to cache packages+info across runs. The github cache action will create a cache, but because of how they are run on crossbow (i.e. on different branches) the caches are never accessible in different runs. I've kept the cacheing step in for now, if we could find a way to (manually?) run this on the main branch like https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/cache_vcpkg.yml before we use this heavily (i.e. likely only around a release) that would create a cache that could be used to speed up some of the jobs.

Closes apache#10345 from jonkeane/ARROW-12569-revdepcheck

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Adjust the R version used to be able to install binary arrow packages from RSPM. Small adjustment to tests that doesn't require the order of attributes to be fixed (the order changed slightly in version 3.0.0)

Closes apache#10409 from jonkeane/ARROW-12883-version-compatibility

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Closes apache#10368 from thisisnic/ARROW-12841_examples_part_2

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
…nd is_in

Closes apache#10383 from thisisnic/ARROW-12777_match_arrow_is_in

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Closes apache#10419 from raybellwaves/docs-np-import

Authored-by: Ray Bell <[email protected]>
Signed-off-by: David Li <[email protected]>
Closes apache#10413 from jonkeane/ARROW-12894

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
…ory' into feature/cache-object-code-in-memory

# Conflicts:
#	cpp/src/gandiva/base_object_cache.h
#	cpp/src/gandiva/cache.h
#	cpp/src/gandiva/engine.h
#	cpp/src/gandiva/lru_cache.h
#	cpp/src/gandiva/projector.cc
#	cpp/src/gandiva/projector.h
anthonylouisbsb pushed a commit that referenced this pull request Jun 16, 2021
Before change:

```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in
    #1 0x7f28ae5826f4 in
    #2 0x7f28ae57fa5d in
    #3 0x7f28ae58cb0f in
    #4 0x7f28ae58bda0 in
    ...
```

After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
    #1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
    #2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
    #3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
    #4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
    ...
```

Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.