Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add merge type validation on pandas.merge #59435

Merged

Conversation

KevsterAmp
Copy link
Contributor

Tests on pandas.merge

(pandas-dev) kev@mac pandas % pytest pandas/tests/reshape/merge/test_merge.py
+ /opt/homebrew/Caskroom/miniforge/base/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command
============================================================================================================ test session starts =============================================================================================================
platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
PyQt5 5.15.9 -- Qt runtime 5.15.8 -- Qt compiled 5.15.8
rootdir: /Users/kev/self/pandas
configfile: pyproject.toml
plugins: localserver-0.0.0, qt-4.4.0, cov-5.0.0, anyio-4.4.0, hypothesis-6.108.7, cython-0.3.1, xdist-3.6.1
collected 939 items

pandas/tests/reshape/merge/test_merge.py ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

------------------------------------------------------------------------------------------ generated xml file: /Users/kev/self/pandas/test-data.xml ------------------------------------------------------------------------------------------
============================================================================================================ slowest 30 durations ============================================================================================================
0.02s call     pandas/tests/reshape/merge/test_merge.py::TestMergeDtypes::test_merge_ea_with_string[inner-str0]
0.02s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_indexes
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_validation
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_indicator_result_integrity
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_indicator
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_index_many_to_many
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_left_empty_right_notempty

(23 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================================================================================ 939 passed in 1.37s =============================================================================================================

Error Reproducible Example

from datetime import datetime

data_A = list()
data_B = list()

for i in range(10):
    data_A.append({
        "id": i,
        "created_date": datetime.today(),
        "created_at": datetime.now(),
    })

    data_B.append({
        "id": i if i % 2 == 0 else 3*i,
        "created_date": datetime.today(),
        "created_at": datetime.now(),
    })

df_A = pd.DataFrame.from_dict(data_A)

df_B = pd.DataFrame.from_dict(data_B)

df_A.merge(df_B, how="full", on="id")
Traceback (most recent call last):
  File "/Users/kev/self/pandas/test.py", line 29, in <module>
    df_A.merge(df_B, how="full", on="id")
  File "/Users/kev/self/pandas/pandas/core/frame.py", line 10807, in merge
    return merge(
  File "/Users/kev/self/pandas/pandas/core/reshape/merge.py", line 355, in merge
    raise ValueError(f"'{how}' is not a valid Merge type ({merge_type})")
ValueError: 'full' is not a valid Merge type (['left', 'right', 'inner', 'outer', 'cross'])

Not sure if it should raise ValueError or MergeError

@@ -350,7 +350,10 @@ def merge(
left_df = _validate_operand(left)
left._check_copy_deprecation(copy)
right_df = _validate_operand(right)
if how == "cross":
merge_type = ["left", "right", "inner", "outer", "cross"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in _MergeOperation.

And a test is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke - moved it to _MergeOperation and added asof as valid Merge Type.


Added test function to validate it as well. Thanks

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas labels Aug 7, 2024
@KevsterAmp
Copy link
Contributor Author

@mroeschke - Added tests and added asof as valid merge type, running pytest pandas/tests/reshape/merge passes all tests.

(pandas-dev) kev@mac pandas % pytest pandas/tests/reshape/merge
+ /opt/homebrew/Caskroom/miniforge/base/envs/pandas-dev/bin/ninja
[4/4] Linking target pandas/_libs/tslibs/offsets.cpython-310-darwin.so
============================================================================================================ test session starts =============================================================================================================
platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
PyQt5 5.15.9 -- Qt runtime 5.15.8 -- Qt compiled 5.15.8
rootdir: /Users/kev/self/pandas
configfile: pyproject.toml
plugins: localserver-0.0.0, qt-4.4.0, cov-5.0.0, anyio-4.4.0, hypothesis-6.108.7, cython-0.3.1, xdist-3.6.1
collected 1332 items

pandas/tests/reshape/merge/test_join.py ..........................................................................
pandas/tests/reshape/merge/test_merge.py .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
pandas/tests/reshape/merge/test_merge_asof.py .s...................................................................................................s.....................s.............................
pandas/tests/reshape/merge/test_merge_cross.py .................
pandas/tests/reshape/merge/test_merge_index_as_string.py ................................................................................
pandas/tests/reshape/merge/test_merge_ordered.py .....................
pandas/tests/reshape/merge/test_multi.py ..........................................

------------------------------------------------------------------------------------------ generated xml file: /Users/kev/self/pandas/test-data.xml ------------------------------------------------------------------------------------------
============================================================================================================ slowest 30 durations ============================================================================================================
0.02s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_full_outer_join
0.02s call     pandas/tests/reshape/merge/test_join.py::test_join_multiindex_not_alphabetical_categorical[categories0-values0]
0.02s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_indexes
0.02s call     pandas/tests/reshape/merge/test_multi.py::TestMergeMulti::test_compress_group_combinations
0.02s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_right_outer_join
0.01s call     pandas/tests/reshape/merge/test_multi.py::TestMergeMulti::test_left_join_multi_index[True-True]
0.01s call     pandas/tests/reshape/merge/test_multi.py::TestMergeMulti::test_left_join_multi_index[False-True]
0.01s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_left_outer_join
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_validation
0.01s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_join_many_non_unique_index
0.01s setup    pandas/tests/reshape/merge/test_merge_index_as_string.py::test_merge_indexes_and_columns_lefton_righton[left_df2-right_df1-left_on3-right_on3-outer]
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_indicator_result_integrity
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_indicator
0.01s call     pandas/tests/reshape/merge/test_multi.py::TestMergeMulti::test_left_join_multi_index[True-False]
0.01s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_inner_join
0.01s setup    pandas/tests/reshape/merge/test_merge_index_as_string.py::test_merge_indexes_and_columns_lefton_righton[left_df2-right_df2-left_on1-right_on1-right]
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_non_unique_index_many_to_many
0.01s call     pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_left_empty_right_notempty
0.01s call     pandas/tests/reshape/merge/test_multi.py::TestMergeMulti::test_left_join_multi_index[False-False]
0.01s call     pandas/tests/reshape/merge/test_join.py::TestJoin::test_join_multiindex

(10 durations < 0.005s hidden.  Use -vv to show these durations.)
====================================================================================================== 1329 passed, 3 skipped in 2.28s =======================================================================================================

@KevsterAmp KevsterAmp requested a review from mroeschke August 12, 2024 01:53
@mroeschke
Copy link
Member

Please view the CI logs as another test needs fixing https://github.com/pandas-dev/pandas/actions/runs/10337314281/job/28614031854?pr=59435

@KevsterAmp KevsterAmp force-pushed the add-pandas-merge-how-param-validation branch from 7f8b512 to 86b4055 Compare August 21, 2024 12:35
@KevsterAmp KevsterAmp requested a review from mroeschke August 21, 2024 22:55
@mroeschke
Copy link
Member

pre-commit.ci autofix

@mroeschke mroeschke added this to the 3.0 milestone Aug 22, 2024
@mroeschke mroeschke merged commit 2748365 into pandas-dev:main Aug 22, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @KevsterAmp

matiaslindgren pushed a commit to matiaslindgren/pandas that referenced this pull request Aug 25, 2024
* add merge type validation on pandas.merge

* add ENH to latest whatsnew doc

* move merge type validation to _MergeOperation class

* add tests on merge validation; add asof as valid mergetype

* change merge_type from tuple to dict

* change ValueError message to updated change

* change Error message on merge type

* update test error messages for merge type errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: UnboundLocalError when full outer merging two dataframes
2 participants