Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow module for Wazuh agent - WARNING #7566

Closed
SamsonIdowu opened this issue Jul 24, 2024 · 4 comments · Fixed by #7589
Closed

Pyarrow module for Wazuh agent - WARNING #7566

SamsonIdowu opened this issue Jul 24, 2024 · 4 comments · Fixed by #7589
Assignees
Labels
level/task Task issue type/bug Bug issue

Comments

@SamsonIdowu
Copy link
Member

Hello,

The steps in the documentation for installing pyarrow are not making the module work for the Wazuh agent.
This is related to the Release 4.9.0 - Alpha 3 - E2E UX tests - Amazon Cloudwatch Logs integration .
Kindly review as it generates the following error:

[root@fedora ~]# tail -f /var/ossec/logs/ossec.log
2024/07/24 18:01:11 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  Returned exit code 10
2024/07/24 18:01:11 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  pyarrow module is required.

2024/07/24 18:01:11 wazuh-modulesd:aws-s3: INFO: Fetching logs finished.
2024/07/24 18:02:04 wazuh-modulesd:aws-s3: INFO: Starting fetching of logs.
2024/07/24 18:02:04 wazuh-modulesd:aws-s3: INFO: Executing Service Analysis: (Service: cloudwatchlogs, Profile: default)
2024/07/24 18:02:11 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  Returned exit code 10
2024/07/24 18:02:11 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  pyarrow module is required.

2024/07/24 18:02:11 wazuh-modulesd:aws-s3: INFO: Fetching logs finished.
2024/07/24 18:03:04 wazuh-modulesd:aws-s3: INFO: Starting fetching of logs.
2024/07/24 18:03:04 wazuh-modulesd:aws-s3: INFO: Executing Service Analysis: (Service: cloudwatchlogs, Profile: default)
2024/07/24 18:03:10 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  Returned exit code 10
2024/07/24 18:03:10 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  pyarrow module is required.

cc: @fdalmaup

@SamsonIdowu SamsonIdowu changed the title Pyarrow module for Wazuh agent Pyarrow module for Wazuh agent - WARNING Jul 24, 2024
@fdalmaup fdalmaup added level/task Task issue type/bug Bug issue labels Jul 24, 2024
@javiersanchz javiersanchz self-assigned this Jul 26, 2024
@javiersanchz
Copy link
Member

Update

  • I began investigating why pyarrow is not being found, as it can be seen that it is installed in the dependencies installation.
  • I talked more with Facu about this, and he mentioned that it seems to be due to an incompatibility with numpy, which is installing version 2.0.1, a fairly recent version.
  • On Monday, I will continue testing with earlier versions of numpy that are compatible with pyarrow and do not break, like the one used in framework/requirements.txt, which is numpy==1.26.0.

@javiersanchz
Copy link
Member

javiersanchz commented Jul 29, 2024

Update

The error mentioned in the issue was verified in the agent, and the current steps mentioned in were followed docu-v4.9.0-alpha3

The same configuration used in the E2E was added to /var/ossec/etc/ossec.conf, and the credentials were added to .aws/credentials

The necessary dependencies from were installed : pip3 install boto3==1.17.85 pyarrow==14.0.1

root@474fb88aa286:/# pip freeze
boto3==1.17.85
botocore==1.20.112
cryptography==3.4.8
jmespath==0.10.0
numpy==2.0.1
pyarrow==14.0.1
python-dateutil==2.9.0.post0
s3transfer==0.4.2
six==1.16.0
urllib3==1.26.19

After restarting the agent, we observed the error:

2024/07/29 10:38:47 wazuh-modulesd:aws-s3: INFO: Executing Service Analysis: (Service: cloudwatchlogs, Profile: default)
2024/07/29 10:38:47 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  Returned exit code 10
2024/07/29 10:38:47 wazuh-modulesd:aws-s3: WARNING: Service: cloudwatchlogs  -  pyarrow module is required
2024/07/29 10:38:47 wazuh-modulesd:aws-s3: INFO: Fetching logs finished.

And from the interpreter, this was the output:

root@474fb88aa286:/# python3
Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
AttributeError: _ARRAY_API not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
  File "pyarrow/lib.pyx", line 36, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
>>> print(pyarrow.__version__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'pyarrow' is not defined
  • The message you're seeing indicates that there is a compatibility issue between the versions of NumPy and pyarrow that you have installed.
  • The pyarrow module you have installed was compiled with NumPy 1.x but is trying to run in an environment with NumPy 2.0.1. Modules compiled with NumPy 1.x are not compatible with NumPy 2.0.1, which causes the failure.
  • The specific error is AttributeError: _ARRAY_API not found, which can be found in the NumPy documentation: https://numpy.org/devdocs/user/troubleshooting-importerror.html

The next step was to remove the dependencies for both numpy and pyarrow:

root@474fb88aa286:/# pip3 uninstall numpy pyarrow
Found existing installation: numpy 2.0.1
Uninstalling numpy-2.0.1:
  Would remove:
    /usr/local/bin/f2py
    /usr/local/bin/numpy-config
    /usr/local/lib/python3.10/dist-packages/numpy-2.0.1.dist-info/*
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libgfortran-040039e1-0352e75f.so.5.0.0
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libquadmath-96973f99-934c22de.so.0.0.0
    /usr/local/lib/python3.10/dist-packages/numpy.libs/libscipy_openblas64_-99b71e71.so
    /usr/local/lib/python3.10/dist-packages/numpy/*
Proceed (Y/n)? y
  Successfully uninstalled numpy-2.0.1
Found existing installation: pyarrow 14.0.1
Uninstalling pyarrow-14.0.1:
  Would remove:
    /usr/local/lib/python3.10/dist-packages/pyarrow-14.0.1.dist-info/*
    /usr/local/lib/python3.10/dist-packages/pyarrow/*
Proceed (Y/n)? y
  Successfully uninstalled pyarrow-14.0.1

Then, pyarrow was reinstalled with the version found in the documentation, pyarrow==14.0.1, and numpy with the version specified in the framework/requirements.txt of Wazuh, which is numpy==1.26.0 :

https://github.com/wazuh/wazuh/blob/master/framework/requirements.txt#L61

root@474fb88aa286:/# pip3 install numpy==1.26.0 pyarrow==14.0.1
Collecting numpy==1.26.0
  Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
Collecting pyarrow==14.0.1
  Using cached pyarrow-14.0.1-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.0 kB)
Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 37.2 MB/s eta 0:00:00
Using cached pyarrow-14.0.1-cp310-cp310-manylinux_2_28_x86_64.whl (38.0 MB)
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.26.0 pyarrow-14.0.1
...
root@474fb88aa286:/# pip freeze
boto3==1.17.85
botocore==1.20.112
cryptography==3.4.8
jmespath==0.10.0
numpy==1.26.0
pyarrow==14.0.1
python-dateutil==2.9.0.post0
s3transfer==0.4.2
six==1.16.0
urllib3==1.26.19

And the interpreter was run again:

root@474fb88aa286:/# python3
Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> 

Once the agent was restarted, this is the output we can observe from CloudWatch Logs :

root@474fb88aa286:/# tail -f /var/ossec/logs/ossec.log | grep wazuh-modulesd           
2024/07/29 11:40:56 wazuh-modulesd:aws-s3[11380] wm_aws.c:703 at wm_aws_run_service(): DEBUG: Service: cloudwatchlogs  -  OUTPUT: DEBUG: +++ Debug mode on - Level: 2
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] wm_aws.c:201 at wm_aws_main(): INFO: Fetching logs finished.
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] schedule_scan.c:153 at _get_next_time(): WARNING: Interval overtaken.
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] wm_aws.c:84 at wm_aws_main(): INFO: Starting fetching of logs.
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] wm_aws.c:171 at wm_aws_main(): INFO: Executing Service Analysis: (Service: cloudwatchlogs, Profile: default)
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] wm_aws.c:558 at wm_aws_run_service(): DEBUG: Create argument list
2024/07/29 11:41:53 wazuh-modulesd:aws-s3[11380] wm_aws.c:662 at wm_aws_run_service(): DEBUG: Launching S3 Command: wodles/aws/aws-s3 --service cloudwatchlogs --aws_profile default --regions us-east-1 --aws_log_groups /aws/lambda/ec2-instance-autodeletion --debug 2

...

DEBUG: Getting CloudWatch logs from log stream "2024/07/29/[$LATEST]xxxxxxxx" in log group "/aws/lambda/ec2-instance-autodeletion" using token "f/xxxxxxxxxxxxxx/s", start_time "1722222154858" and end_time "None"
DEBUG: +++ There are no new events in the "/aws/lambda/ec2-instance-autodeletion" group
DEBUG: Saving data for log group "/aws/lambda/ec2-instance-autodeletion" and log stream "2024/07/29/[$LATEST]xxxxxxxxxxxxxxx".
DEBUG: The saved values are "{'token': 'f/xxxxxxxxxxxxxxxxxxxxxxxxxs', 'start_time': 1722211200000, 'end_time': 1722222154858}"
DEBUG: Some data already exists on DB for that key. Updating their values...

@javiersanchz
Copy link
Member

javiersanchz commented Jul 29, 2024

Update

The next step will be to follow the same steps from the documentation as before, but when installing the dependencies, the specific version numpy==1.26.0 will be added. If everything works correctly, a PR will be opened to update the documentation to include this dependency version :

pip3 install boto3==1.17.85 pyarrow==14.0.1 numpy==1.26.0

@javiersanchz
Copy link
Member

Update

I followed the entire documentation on docu-v4.9.0-alpha3 again, modifying only the last command to install the dependencies by adding numpy==1.26.0 :

root@50211a4e9179:/# pip3 install boto3==1.17.85 pyarrow==14.0.1 numpy==1.26.0
Collecting boto3==1.17.85
  Downloading boto3-1.17.85-py2.py3-none-any.whl.metadata (6.2 kB)
Collecting pyarrow==14.0.1
  Downloading pyarrow-14.0.1-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.0 kB)
Collecting numpy==1.26.0
  Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
Collecting botocore<1.21.0,>=1.20.85 (from boto3==1.17.85)
  Downloading botocore-1.20.112-py2.py3-none-any.whl.metadata (5.6 kB)
Collecting jmespath<1.0.0,>=0.7.1 (from boto3==1.17.85)
  Downloading jmespath-0.10.0-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting s3transfer<0.5.0,>=0.4.0 (from boto3==1.17.85)
  Downloading s3transfer-0.4.2-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.21.0,>=1.20.85->boto3==1.17.85)
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting urllib3<1.27,>=1.25.4 (from botocore<1.21.0,>=1.20.85->boto3==1.17.85)
  Downloading urllib3-1.26.19-py2.py3-none-any.whl.metadata (49 kB)
Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.21.0,>=1.20.85->boto3==1.17.85)
  Downloading six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Downloading boto3-1.17.85-py2.py3-none-any.whl (131 kB)
Downloading pyarrow-14.0.1-cp310-cp310-manylinux_2_28_x86_64.whl (38.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.0/38.0 MB 41.9 MB/s eta 0:00:00
Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 41.9 MB/s eta 0:00:00
Downloading botocore-1.20.112-py2.py3-none-any.whl (7.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.7/7.7 MB 42.0 MB/s eta 0:00:00
Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Downloading s3transfer-0.4.2-py2.py3-none-any.whl (79 kB)
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading urllib3-1.26.19-py2.py3-none-any.whl (143 kB)
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: urllib3, six, numpy, jmespath, python-dateutil, pyarrow, botocore, s3transfer, boto3
Successfully installed boto3-1.17.85 botocore-1.20.112 jmespath-0.10.0 numpy-1.26.0 pyarrow-14.0.1 python-dateutil-2.9.0.post0 s3transfer-0.4.2 six-1.16.0 urllib3-1.26.19

...

root@50211a4e9179:/# pip freeze
boto3==1.17.85
botocore==1.20.112
cryptography==3.4.8
jmespath==0.10.0
numpy==1.26.0
pyarrow==14.0.1
python-dateutil==2.9.0.post0
s3transfer==0.4.2
six==1.16.0
urllib3==1.26.19

(The version numpy==1.26.0 was used as it is the one specified in the wazuh/wazuh framework/requirements.txt file and is compatible).

  • The same configuration was added to /var/ossec/etc/ossec.conf, the credentials were added and the wazuh_modules.debug=2 in /var/ossec/etc/internal_options.conf

And this was the output of CloudWatch Logs on the agent:

root@50211a4e9179:/# tail -f /var/ossec/logs/ossec.log | grep wazuh-modulesd
2024/07/29 12:55:11 wazuh-modulesd:aws-s3[3768] wm_aws.c:84 at wm_aws_main(): INFO: Starting fetching of logs.
2024/07/29 12:55:11 wazuh-modulesd:aws-s3[3768] wm_aws.c:171 at wm_aws_main(): INFO: Executing Service Analysis: (Service: cloudwatchlogs, Profile: default)
2024/07/29 12:55:11 wazuh-modulesd:aws-s3[3768] wm_aws.c:558 at wm_aws_run_service(): DEBUG: Create argument list
2024/07/29 12:55:11 wazuh-modulesd:aws-s3[3768] wm_aws.c:662 at wm_aws_run_service(): DEBUG: Launching S3 Command: wodles/aws/aws-s3 --service cloudwatchlogs --aws_profile default --regions us-east-1 --aws_log_groups wazuh-cloudwatchlogs-integration-tests --debug 2
2024/07/29 12:55:14 wazuh-modulesd:aws-s3[3768] wm_aws.c:703 at wm_aws_run_service(): DEBUG: Service: cloudwatchlogs  -  OUTPUT: DEBUG: +++ Debug mode on - Level: 2
2024/07/29 12:55:14 wazuh-modulesd:aws-s3[3768] wm_aws.c:201 at wm_aws_main(): INFO: Fetching logs finished.
2024/07/29 12:55:14 wazuh-modulesd:aws-s3[3768] wm_aws.c:80 at wm_aws_main(): DEBUG: Sleeping until: 2024/07/29 12:56:11
...
2024/07/29 12:56:13 wazuh-modulesd:aws-s3[3768] wm_aws.c:703 at wm_aws_run_service(): DEBUG: Service: cloudwatchlogs  -  OUTPUT: DEBUG: +++ Debug mode on - Level: 2
DEBUG: +++ Getting alerts from "us-east-1" region.
DEBUG: Generating default configuration for retries: mode standard - max_attempts 10
DEBUG: only logs: None
DEBUG: Getting log streams for "wazuh-cloudwatchlogs-integration-tests" log group
DEBUG: Found "wazuh-cloudwatchlogs-integration-tests" log stream in wazuh-cloudwatchlogs-integration-tests
DEBUG: Getting data from DB for log stream "wazuh-cloudwatchlogs-integration-tests" in log group "wazuh-cloudwatchlogs-integration-tests"
DEBUG: Token: "f/xxxxxxxxxxxxxxxxxxxxxx/s", start_time: "1722211200000", end_time: "1722211200003"
DEBUG: Getting CloudWatch logs from log stream "wazuh-cloudwatchlogs-integration-tests" in log group "wazuh-cloudwatchlogs-integration-tests" using token "f/xxxxxxxxxxxxxxxxxxxxxx/s", start_time "1722211200004" and end_time "None"
DEBUG: +++ There are no new events in the "wazuh-cloudwatchlogs-integration-tests" group
DEBUG: Saving data for log group "wazuh-cloudwatchlogs-integration-tests" and log stream "wazuh-cloudwatchlogs-integration-tests".
DEBUG: The saved values are "{'token': 'f/xxxxxxxxxxxxxxxxxxxx/s', 'start_time': 1722211200000, 'end_time': 1722211200004}"
DEBUG: Some data already exists on DB for that key. Updating their values...
DEBUG: Purging the BD
DEBUG: Getting log streams for "wazuh-cloudwatchlogs-integration-tests" log group
DEBUG: Found "wazuh-cloudwatchlogs-integration-tests" log stream in wazuh-cloudwatchlogs-integration-tests
DEBUG: committing changes and closing the DB

And the interpreter was run again:

root@50211a4e9179:/# python3
Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> exit()
root@50211a4e9179:/# 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/bug Bug issue
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants