Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. #4484

Closed
wants to merge 19 commits into from

Conversation

emkornfield
Copy link
Contributor

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

@emkornfield
Copy link
Contributor Author

This also assumes subsystem codes don't map to top level codes (I think I would prefer they do, but I could see them going both ways).

@emkornfield emkornfield changed the title [Proposal/Sketch] ARROW-4036: [C++]not-quite pluggable error codes [Proposal/Sketch] ARROW-4036: [C++] not-quite pluggable error codes Jun 6, 2019
@@ -92,6 +92,7 @@ enum class StatusCode : char {
SerializationError = 11,
PythonError = 12,
RError = 13,
SubsystemError = 14,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the right approach. If I have a Python ValueError, I want it both to map to StatusCode::Invalid and to retain the original Python exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I tend to agree. So two issues to improve on:

  • top-level codes should be orthogonal from subsytem codes
  • Potentially need the ability to directly link to underlying exception (e.g. PyObject for a python error)

Any other high level concepts/functionallty missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I also think subsystem codes can be optional (for example, Python would just use the underlying exception instead of defining integer codes).

So instead of having a subsystem code, we could have an opaque optional detail instance, e.g.:

class StatusDetail {
 public:
  virtual ~StatusDetail();
  virtual std::string ToString() const = 0;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I came to the same conclusion after stepping away from the computer. Will try to update tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to just have a status detail, that wraps has a message accessor and unify the current implementation with a private implementation of the interface. I can see arguments for keeping a separate message member as well, so I'm happy to add it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of having a separate message member, so that the detail can remain null if there's no error-specific detail to carry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added msg field. back.


};

enum class Subsystem : char {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Flight also warrant a subsystem? It would be nice to express errors like unauthorized, service unavailable/overloaded, etc. (tagging my work account @lihalite)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to remove this abd go with the approach suggested by Antoine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihalite when this lands, I think some codes should probably be top level (unavailable seems generic across multiple service type things) but some might be flight specific.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, great. I started sketching out a set of codes for Flight/Java, I can pick up the C++/Python side too once this lands.

@codecov-io
Copy link

codecov-io commented Jun 8, 2019

Codecov Report

Merging #4484 into master will increase coverage by 21.19%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4484       +/-   ##
===========================================
+ Coverage   67.96%   89.15%   +21.19%     
===========================================
  Files         676      588       -88     
  Lines       74307    70058     -4249     
  Branches     1253        0     -1253     
===========================================
+ Hits        50501    62460    +11959     
+ Misses      23559     7598    -15961     
+ Partials      247        0      -247
Impacted Files Coverage Δ
cpp/src/arrow/status-test.cc 100% <100%> (ø)
cpp/src/arrow/status.cc 37.75% <100%> (+24.71%) ⬆️
cpp/src/arrow/status.h 95.69% <100%> (+22.28%) ⬆️
cpp/src/arrow/array/builder_union.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/csv/reader.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/csv/reader.cc 0% <0%> (-92.9%) ⬇️
cpp/src/arrow/adapters/orc/adapter_util.cc 15.74% <0%> (-64.97%) ⬇️
cpp/src/arrow/array/builder_union.h 0% <0%> (-61.91%) ⬇️
cpp/src/arrow/util/cpu-info.h 0% <0%> (-50%) ⬇️
cpp/src/arrow/csv/options.h 66.66% <0%> (-33.34%) ⬇️
... and 677 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2ef7d9...e20eb9b. Read the comment docs.

@emkornfield emkornfield changed the title [Proposal/Sketch] ARROW-4036: [C++] not-quite pluggable error codes ARROW-4036: [C++] Pluggable Status message, buy exposing an abstract delegate class. Jun 9, 2019
@emkornfield
Copy link
Contributor Author

If you want me to try to cleanup one of the subsystem (python?) in this PR let me know and I can give it a shot.

@emkornfield emkornfield changed the title ARROW-4036: [C++] Pluggable Status message, buy exposing an abstract delegate class. ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. Jun 9, 2019
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks nice on the principle. It would be useful to validate this by migrating the Gandiva, Plasma... status codes.

cpp/src/arrow/status-test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/status.h Outdated Show resolved Hide resolved
cpp/src/arrow/status.h Show resolved Hide resolved
@emkornfield
Copy link
Contributor Author

emkornfield commented Jun 12, 2019

I think this looks nice on the principle. It would be useful to validate this by migrating the Gandiva, Plasma... status codes.

Will give this a try I'm not sure about downstream dependencies though.

@emkornfield emkornfield force-pushed the status_code_proposal branch from e20eb9b to c78f74d Compare June 12, 2019 06:37
@emkornfield emkornfield changed the title ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. ARROW-4036: [WIP] [C++] Pluggable Status message, by exposing an abstract delegate class. Jun 16, 2019
@emkornfield emkornfield force-pushed the status_code_proposal branch from ff5fa6e to 814d55f Compare June 17, 2019 06:32
@emkornfield emkornfield force-pushed the status_code_proposal branch from 814d55f to 93737ac Compare June 22, 2019 07:59
@emkornfield emkornfield changed the title ARROW-4036: [WIP] [C++] Pluggable Status message, by exposing an abstract delegate class. ARROW-4036: [C++][Python] Pluggable Status message, by exposing an abstract delegate class. Jun 22, 2019
@emkornfield
Copy link
Contributor Author

@pitrou this demonstrates using the the StatusDetail with Python and Plasma. This breaks some C++ and Python APIs, I'm not sure if we are OK with that.

If this approach is OK, I will do a follow-up PR to remove Gandiva codes as well.

CC @pcmoritz

@emkornfield emkornfield changed the title ARROW-4036: [C++][Python] Pluggable Status message, by exposing an abstract delegate class. ARROW-4036: [WIP][C++] Pluggable Status message, by exposing an abstract delegate class. Jun 22, 2019
@emkornfield
Copy link
Contributor Author

putting back in WIP status until I can get CI green (python tests passed for me locally)

@emkornfield emkornfield force-pushed the status_code_proposal branch 2 times, most recently from e2f9a14 to 12c5c61 Compare June 28, 2019 15:35
@emkornfield emkornfield force-pushed the status_code_proposal branch from 12c5c61 to 8433843 Compare July 2, 2019 07:24
@emkornfield emkornfield changed the title ARROW-4036: [WIP][C++] Pluggable Status message, by exposing an abstract delegate class. ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. Jul 2, 2019
@emkornfield
Copy link
Contributor Author

@pitrou after a lot of debugging I figured out why the builds were failing only to discover you had fixed with with your PR to provide better errors (there was a copy of Status in a macro that I did not find initally). Thank you! I think this is ready to review. I'll file a follow-up JIRA once this is merged to migrade Gandiva away from this as well.

@pitrou
Copy link
Member

pitrou commented Jul 2, 2019

Will review, thank you :-)

@emkornfield
Copy link
Contributor Author

I'm looking into the lint failure, but I'm guessing i wll need to make some changes anyways.

@pitrou pitrou force-pushed the status_code_proposal branch from 68c9d62 to ea56d1e Compare July 2, 2019 16:01
@pitrou
Copy link
Member

pitrou commented Jul 2, 2019

I've done my thing on Python. Let's wait for CI a bit.

@pitrou
Copy link
Member

pitrou commented Jul 2, 2019

@ursabot build test

@ursabot
Copy link

ursabot commented Jul 2, 2019

Got unexpected extra argument (test)

@pitrou
Copy link
Member

pitrou commented Jul 2, 2019

@ursabot build

1 similar comment
@kszucs
Copy link
Member

kszucs commented Jul 2, 2019

@ursabot build

@pitrou
Copy link
Member

pitrou commented Jul 2, 2019

The failed ursabot builds are unrelated, @kszucs is busy doing things on BuildBot.
The failed AppVeyor build is a transient download failure

Travis-CI build: https://travis-ci.org/pitrou/arrow/builds/553376058
AppVeyor build: https://ci.appveyor.com/project/pitrou/arrow/builds/25698396

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more things to change. I can tackle those tomorrow if you like.

cpp/src/plasma/common.h Outdated Show resolved Hide resolved
cpp/src/plasma/common.cc Outdated Show resolved Hide resolved
python/pyarrow/__init__.py Outdated Show resolved Hide resolved
return std::string("Python exception: ") + ty->tp_name;
}

void RestorePyError() const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this, I started down this path and decided to keep it simpler because there were semantics of the GIL I was hazy on.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2019

I'm going to merge once CI is green.

@pitrou pitrou closed this in a9a82ec Jul 3, 2019
kou pushed a commit that referenced this pull request Jul 4, 2019
…elegate class.

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

Author: Micah Kornfield <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes #4484 from emkornfield/status_code_proposal and squashes the following commits:

4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module
a66f999 <Micah Kornfield> make format
040216d <Micah Kornfield> fixes for comments outside python
729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully)
ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status())
21e1b95 <Micah Kornfield> fix compilation
9c905b0 <Micah Kornfield> fix lint
74d563c <Micah Kornfield> fixes
85786ef <Micah Kornfield> change messages
3626a90 <Micah Kornfield> try removing message
a4e6a1f <Micah Kornfield> add logging for debug
4586fd1 <Micah Kornfield> fix typo
8f011b3 <Micah Kornfield> fix status propagation
317ea9c <Micah Kornfield> fix complie
9f59160 <Micah Kornfield> don't make_shared inline
484b3a2 <Micah Kornfield> style fix
14e3467 <Micah Kornfield> dont rely on rtti
cd22df6 <Micah Kornfield> format
dec4585 <Micah Kornfield> not-quite pluggable error codes
kszucs pushed a commit that referenced this pull request Jul 22, 2019
…elegate class.

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

Author: Micah Kornfield <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes #4484 from emkornfield/status_code_proposal and squashes the following commits:

4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module
a66f999 <Micah Kornfield> make format
040216d <Micah Kornfield> fixes for comments outside python
729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully)
ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status())
21e1b95 <Micah Kornfield> fix compilation
9c905b0 <Micah Kornfield> fix lint
74d563c <Micah Kornfield> fixes
85786ef <Micah Kornfield> change messages
3626a90 <Micah Kornfield> try removing message
a4e6a1f <Micah Kornfield> add logging for debug
4586fd1 <Micah Kornfield> fix typo
8f011b3 <Micah Kornfield> fix status propagation
317ea9c <Micah Kornfield> fix complie
9f59160 <Micah Kornfield> don't make_shared inline
484b3a2 <Micah Kornfield> style fix
14e3467 <Micah Kornfield> dont rely on rtti
cd22df6 <Micah Kornfield> format
dec4585 <Micah Kornfield> not-quite pluggable error codes
suquark added a commit to suquark/arrow that referenced this pull request Oct 25, 2019
revert some changes

replace event loop with asio

update plasma store protocol

fix qualifiers

update plasma store client protocol

Remove all native socket operations.

Implement general io support

Fix bugs

fix all compiling bugs

fix bug

Fix all tests.

Add license header.

try to fix cmake

try to make asio standalone

simplify code

add license

update url

lint

lint & fix

fix

restore entrypoint

remove unused unix headers

fix

Update LICENSE

fix

rename

handle signal

move the function to its original place

fix doc

hide classes

stop installing asio headers

fix doc

reverse changes

minor fix

tiny fix

fix comments

minor fix

resolve conflicts

fix

optimize cmake

fix

update formatter

fix according to github comments

lint

prevent the store from dying

fix ARROW_CHECK

Fix

ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class.

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

Author: Micah Kornfield <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits:

4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module
a66f999 <Micah Kornfield> make format
040216d <Micah Kornfield> fixes for comments outside python
729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully)
ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status())
21e1b95 <Micah Kornfield> fix compilation
9c905b0 <Micah Kornfield> fix lint
74d563c <Micah Kornfield> fixes
85786ef <Micah Kornfield> change messages
3626a90 <Micah Kornfield> try removing message
a4e6a1f <Micah Kornfield> add logging for debug
4586fd1 <Micah Kornfield> fix typo
8f011b3 <Micah Kornfield> fix status propagation
317ea9c <Micah Kornfield> fix complie
9f59160 <Micah Kornfield> don't make_shared inline
484b3a2 <Micah Kornfield> style fix
14e3467 <Micah Kornfield> dont rely on rtti
cd22df6 <Micah Kornfield> format
dec4585 <Micah Kornfield> not-quite pluggable error codes

fix merge

fix

update

update

update

update

fix

update

fix

update

update

revert some unknown comments

rebase CMakeLists

rebase eviction_policy.h

rebase CMakeLists

rebase
suquark added a commit to suquark/arrow that referenced this pull request Oct 25, 2019
revert some changes

replace event loop with asio

update plasma store protocol

fix qualifiers

update plasma store client protocol

Remove all native socket operations.

Implement general io support

Fix bugs

fix all compiling bugs

fix bug

Fix all tests.

Add license header.

try to fix cmake

try to make asio standalone

simplify code

add license

update url

lint

lint & fix

fix

restore entrypoint

remove unused unix headers

fix

Update LICENSE

fix

rename

handle signal

move the function to its original place

fix doc

hide classes

stop installing asio headers

fix doc

reverse changes

minor fix

tiny fix

fix comments

minor fix

resolve conflicts

fix

optimize cmake

fix

update formatter

fix according to github comments

lint

prevent the store from dying

fix ARROW_CHECK

Fix

ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class.

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

Author: Micah Kornfield <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits:

4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module
a66f999 <Micah Kornfield> make format
040216d <Micah Kornfield> fixes for comments outside python
729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully)
ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status())
21e1b95 <Micah Kornfield> fix compilation
9c905b0 <Micah Kornfield> fix lint
74d563c <Micah Kornfield> fixes
85786ef <Micah Kornfield> change messages
3626a90 <Micah Kornfield> try removing message
a4e6a1f <Micah Kornfield> add logging for debug
4586fd1 <Micah Kornfield> fix typo
8f011b3 <Micah Kornfield> fix status propagation
317ea9c <Micah Kornfield> fix complie
9f59160 <Micah Kornfield> don't make_shared inline
484b3a2 <Micah Kornfield> style fix
14e3467 <Micah Kornfield> dont rely on rtti
cd22df6 <Micah Kornfield> format
dec4585 <Micah Kornfield> not-quite pluggable error codes

fix merge

fix

update

update

update

update

fix

update

fix

update

update

revert some unknown comments

rebase CMakeLists

rebase eviction_policy.h

rebase CMakeLists

rebase
jikunshang pushed a commit to jikunshang/arrow that referenced this pull request Dec 24, 2019
revert some changes

replace event loop with asio

update plasma store protocol

fix qualifiers

update plasma store client protocol

Remove all native socket operations.

Implement general io support

Fix bugs

fix all compiling bugs

fix bug

Fix all tests.

Add license header.

try to fix cmake

try to make asio standalone

simplify code

add license

update url

lint

lint & fix

fix

restore entrypoint

remove unused unix headers

fix

Update LICENSE

fix

rename

handle signal

move the function to its original place

fix doc

hide classes

stop installing asio headers

fix doc

reverse changes

minor fix

tiny fix

fix comments

minor fix

resolve conflicts

fix

optimize cmake

fix

update formatter

fix according to github comments

lint

prevent the store from dying

fix ARROW_CHECK

Fix

ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class.

This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).

Author: Micah Kornfield <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits:

4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module
a66f999 <Micah Kornfield> make format
040216d <Micah Kornfield> fixes for comments outside python
729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully)
ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status())
21e1b95 <Micah Kornfield> fix compilation
9c905b0 <Micah Kornfield> fix lint
74d563c <Micah Kornfield> fixes
85786ef <Micah Kornfield> change messages
3626a90 <Micah Kornfield> try removing message
a4e6a1f <Micah Kornfield> add logging for debug
4586fd1 <Micah Kornfield> fix typo
8f011b3 <Micah Kornfield> fix status propagation
317ea9c <Micah Kornfield> fix complie
9f59160 <Micah Kornfield> don't make_shared inline
484b3a2 <Micah Kornfield> style fix
14e3467 <Micah Kornfield> dont rely on rtti
cd22df6 <Micah Kornfield> format
dec4585 <Micah Kornfield> not-quite pluggable error codes

fix merge

fix

update

update

update

update

fix

update

fix

update

update

revert some unknown comments

rebase CMakeLists

rebase eviction_policy.h

rebase CMakeLists

rebase
jorisvandenbossche pushed a commit that referenced this pull request Sep 15, 2020
…pe (int/string) Pandas dataframe to pyarrow Table

This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`.

The message for `Pandas` column with `int` followed by `string`  is now
```
In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object')
```
the same as for `double` followed by `string`:
```
In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object')
```

As a side effect, this snippet [xref #5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16):
```
In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object))
   ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
   ...: result = pa.array(cat, type=typ)
(... traceback...)
ArrowInvalid: Could not convert a with type str: tried to convert to int
```
Finally, this *does* break a test [xref #4484, ARROW-4036] - see code comment

Closes #8044 from arw2019/ARROW-7663

Authored-by: arw2019 <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
…pe (int/string) Pandas dataframe to pyarrow Table

This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`.

The message for `Pandas` column with `int` followed by `string`  is now
```
In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object')
```
the same as for `double` followed by `string`:
```
In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object')
```

As a side effect, this snippet [xref apache#5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16):
```
In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object))
   ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
   ...: result = pa.array(cat, type=typ)
(... traceback...)
ArrowInvalid: Could not convert a with type str: tried to convert to int
```
Finally, this *does* break a test [xref apache#4484, ARROW-4036] - see code comment

Closes apache#8044 from arw2019/ARROW-7663

Authored-by: arw2019 <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…pe (int/string) Pandas dataframe to pyarrow Table

This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`.

The message for `Pandas` column with `int` followed by `string`  is now
```
In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object')
```
the same as for `double` followed by `string`:
```
In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object')
```

As a side effect, this snippet [xref apache#5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16):
```
In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object))
   ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
   ...: result = pa.array(cat, type=typ)
(... traceback...)
ArrowInvalid: Could not convert a with type str: tried to convert to int
```
Finally, this *does* break a test [xref apache#4484, ARROW-4036] - see code comment

Closes apache#8044 from arw2019/ARROW-7663

Authored-by: arw2019 <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants