Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci): support building python on windows #1885

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

An-DJ
Copy link
Contributor

@An-DJ An-DJ commented Oct 14, 2024

What does this PR do?

Support building python on Windows.

  1. Use (git) bash to run python steps on Windows (MYSY, MINGW)
  2. Add missing header
  3. Rename FuryLogLevel::ERROR to FuryLogLevel::ERR. I don't know why this enum make build failed, but it is successed if renaming it to FuryLogLevel::ERR
  4. Rename pyx built dynamic lib name '*.so' to '*.pyd'

Related issues

Close #798

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@chaokunyang
Copy link
Collaborator

Hi @An-DJ , it this PR ready for review?

@An-DJ
Copy link
Contributor Author

An-DJ commented Oct 31, 2024

Hi @An-DJ , it this PR ready for review?

No, CI result is fake, the build script didn't have been executed successfully. We need more works on windows platform...

I will continue this PR in the next few days (too busy in this week...)

An-DJ added 15 commits November 2, 2024 11:11
Signed-off-by: Junduo Dong <[email protected]>
I don't know why this enum make build failed...
but successed if renaming it to FuryLogLevel::ERR

Signed-off-by: Junduo Dong <[email protected]>
* pyfury/_serialization.pyd
* pyfury/_util.pyd
* pyfury/format/_format.pyd
* pyfury/lib/mmh3/mmh3.pyd

Signed-off-by: Junduo Dong <[email protected]>
Signed-off-by: Junduo Dong <[email protected]>
Signed-off-by: Junduo Dong <[email protected]>
Signed-off-by: Junduo Dong <[email protected]>
Signed-off-by: Junduo Dong <[email protected]>
@An-DJ
Copy link
Contributor Author

An-DJ commented Nov 2, 2024

@chaokunyang This UT is strange.

@pytest.mark.parametrize("language", [Language.XLANG, Language.PYTHON])
def test_array_serializer(language):
    fury = Fury(language=language, ref_tracking=True, require_class_registration=False)
    for typecode in PyArraySerializer.typecode_dict.keys():
        arr = array.array(typecode, list(range(10)))

        # additional code to print array attribute
        print(arr.itemsize, arr.typecode)
        # end

        assert ser_de(fury, arr) == arr
    for dtype in Numpy1DArraySerializer.dtypes_dict.keys():
        arr = np.array(list(range(10)), dtype=dtype)
        new_arr = ser_de(fury, arr)
        assert np.array_equal(new_arr, arr)
        np.testing.assert_array_equal(new_arr, arr)

I test it on Ubuntu 20.04 and Windows 10, but there are different result:

# Ubuntu

pyfury/tests/test_serializer.py::test_array_serializer[Language.XLANG] 2 h
4 i
8 l
4 f
8 d
PASSED
pyfury/tests/test_serializer.py::test_array_serializer[Language.PYTHON] 2 h
4 i
8 l
4 f
8 d
PASSED
# Windows

=================== 1 failed, 1 passed, 3 warnings in 0.88s ===================
FAILED         [ 50%]
2 h
4 i
4 l

The point is that l type could be 4 or 8 bit on different platform.

In addition, in _util.pyx, there are also failed converted between const int64_t and long:

cdef inline uint8_t* get_address(v):
    view = memoryview(v)
    cdef str dtype = view.format
    cdef:
       # ...
        const int64_t[:] signed_long_data
       # ...
        uint8_t* ptr
    if dtype == "b":
        signed_char_data = v
        ptr = <uint8_t*>(&signed_char_data[0])
    elif dtype == "l":  # here
        signed_long_data = v
        ptr = <uint8_t*>(&signed_long_data[0])
    else:
        raise Exception(f"Unsupported buffer of type {type(v)} and format {dtype}")
    return ptr

Could you plz give me some advice?

@An-DJ An-DJ marked this pull request as ready for review November 2, 2024 16:26
@chaokunyang
Copy link
Collaborator

chaokunyang commented Nov 2, 2024

Is Windows a big endian machine?

@An-DJ
Copy link
Contributor Author

An-DJ commented Nov 3, 2024

Is Windows a big endian machine?

You means the github windows-2022 ci runner is hosted on a big endian machine?

No. According to the related document, for Linux and Windows runners, GitHub uses Dadsv5-series virtual machines. For more information, see Dasv5 and Dadsv5-series in the Microsoft Azure documentation.

image

AMD EPYC 7763v (Genoa) [x86-64] support small endian only.

@An-DJ
Copy link
Contributor Author

An-DJ commented Nov 3, 2024

@chaokunyang Actually we check the python array initialization method array.array with different dtype, but each type code has its Mininum size in bytes, not a fixed size.

IMHO it is the reason why it behaves differently on UNIX and Windows.

Ref: https://docs.python.org/3/library/array.html

image

We should find a strict way to fix each dtype size... right?

@chaokunyang
Copy link
Collaborator

Yeah, you are right. Fury take l as 4 bytes, but it may be 8 bytes on windows. In such cases, we may need to expand every item from 4 bytes value to 8 bytes in a loop manually.

@An-DJ
Copy link
Contributor Author

An-DJ commented Dec 12, 2024

Hi @chaokunyang

Recently I have searched and found that windows(MSVC) standards basic data type range. Also in intel docs.

Unfortunately, long, which used as l in Python, (on 64-bit machine) is stricted in 4 bytes on Windows, but 8 bytes on *nix. That is a gap between *nix and Windows.

So Apache Fury support q (in another words, signed long long) ONLY maybe a better choice. That type is 8 bytes size in both *nix and Windows.

I am not fimilair with Python array.array() users' habit, but this approach could be a simple solution to universe the memory model in Apache Fury Python support.

What do you think of that?

@chaokunyang
Copy link
Collaborator

Great idea, I think we can use q (signed long long) instead of l.

@chaokunyang
Copy link
Collaborator

Great idea, I think we can use q (signed long long) instead of l. And raise Error when l array format type are passed.

@@ -35,7 +35,7 @@ enum class FuryLogLevel {
DEBUG = -1,
INFO = 0,
WARNING = 1,
ERROR = 2,
ERR = 2,
Copy link
Member

@PragmaTwice PragmaTwice Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename FuryLogLevel::ERROR to FuryLogLevel::ERR. I don't know why this enum make build failed, but it is successed if renaming it to FuryLogLevel::ERR

Could you attach the error message/log you face about the original ERROR identifier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a glance, and I think that maybe the Cython generated code defines a macro named ERROR and it's included before this header so that weird thing happens. Not familiar with related code here, maybe @penguin-wwy has ideas about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Windows build support
3 participants