Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump up hadoop-apache2 version to 2.7.4-11 #18037

Closed
wants to merge 1 commit into from

Conversation

imjalpreet
Copy link
Member

== NO RELEASE NOTE ==

@imjalpreet imjalpreet requested a review from a team as a code owner July 14, 2022 11:34
@imjalpreet
Copy link
Member Author

@tdcmeehan @rschlussel I have bumped the hadoop-apache2 version (related to prestodb/presto-hadoop-apache2#49)

@tdcmeehan
Copy link
Contributor

@imjalpreet it looks like there are test failures

@imjalpreet
Copy link
Member Author

@tdcmeehan I will have a look and update the PR

@imjalpreet
Copy link
Member Author

@tdcmeehan I just checked there was another PR (prestodb/presto-hadoop-apache2#47) which got merged into hadoop-apache2 which is causing these failures.

Looks like the config removed in that PR is still required for the current tests to pass, what do you suggest?

@rohanpednekar
Copy link
Contributor

Hey @tdcmeehan, any suggestions on Jalpreet's question?

@tdcmeehan
Copy link
Contributor

Hey @tdcmeehan, any suggestions on Jalpreet's question?

@imjalpreet and I discussed on Slack and he will be figuring out a path forward.

@imjalpreet
Copy link
Member Author

Due to the PR (prestodb/presto-hadoop-apache2#47) some of the current tests are failing with Hadoop 2.x dependency. After some research, I realised that the config removed in the above PR is not required when using Hadoop 3.x but is still needed with the Hadoop 2.x dependency.

To resolve this, we have two options. We can either revert the above PR and bring in the remaining changes or we need to work on upgrading to Hadoop 3.x

After a discussion with @tdcmeehan, we decided it would be better in the long run if we work on upgrading to Hadoop 3.x since it has been pending for a long time.

I was looking into this and saw that we have a branch https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x in presto-hadoop-apache2 which was created a couple of years back. @tdcmeehan Do you have an idea of why we did not merge it with master and release it? We can work on top of that branch unless there were some blockers due to which it wasn't released.

@agrawalreetika
Copy link
Member

I landed in this PR while checking if we have any plans for a Hadoop upgrade.
Do we have any plan to upgrade the Hadoop version to 3.2.x? I see there is this PR prestodb/presto-hadoop-apache2#57 to get it updated to 3.2.3.
I am not sure if it requires any more work for the Hadoop version upgrade. @imjalpreet Do you know if there are any prerequisites for this upgrade? I see there were some test failures with Hadoop version upgrades earlier.

After the Hadoop version upgrade, we can also add a required dependency for supporting Azure Data Lake file system. I see it's added from hadoop-2.8.0 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure-datalake

@imjalpreet
Copy link
Member Author

I see there is this PR prestodb/presto-hadoop-apache2#57 to get it updated to 3.2.3.

That PR auto-generated and is not enough, there are significant changes that are needed to upgrade to Hadoop 3.

I worked with Rajat to get the Hadoop 3.2.x upgrade changes into https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x a few months back. We should have almost all the changes from the Hadoop dependency side but we also need to look into updating the docker images that are used in the CI pipelines since they are currently based on Hadoop 2.7.4. There might be some presto code changes as well.

After the Hadoop version upgrade, we can also add a required dependency for supporting Azure Data Lake file system. I see it's added from hadoop-2.8.0 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure-datalake

Yes, that was one of the aims of this effort when I started looking into it last year but it has got delayed due to prioritisation. We wanted to upgrade to 3.2.x since ADLS Gen 2 was added in Hadoop 3 and there have been a few requests for that as well.

We should have a discussion on the plan and proceed from there.

@imjalpreet
Copy link
Member Author

Merged with #21483

@imjalpreet imjalpreet closed this Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants