-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump up hadoop-apache2 version to 2.7.4-11 #18037
Conversation
imjalpreet
commented
Jul 14, 2022
@tdcmeehan @rschlussel I have bumped the hadoop-apache2 version (related to prestodb/presto-hadoop-apache2#49) |
@imjalpreet it looks like there are test failures |
@tdcmeehan I will have a look and update the PR |
@tdcmeehan I just checked there was another PR (prestodb/presto-hadoop-apache2#47) which got merged into hadoop-apache2 which is causing these failures. Looks like the config removed in that PR is still required for the current tests to pass, what do you suggest? |
Hey @tdcmeehan, any suggestions on Jalpreet's question? |
@imjalpreet and I discussed on Slack and he will be figuring out a path forward. |
Due to the PR (prestodb/presto-hadoop-apache2#47) some of the current tests are failing with Hadoop 2.x dependency. After some research, I realised that the config removed in the above PR is not required when using Hadoop 3.x but is still needed with the Hadoop 2.x dependency. To resolve this, we have two options. We can either revert the above PR and bring in the remaining changes or we need to work on upgrading to Hadoop 3.x After a discussion with @tdcmeehan, we decided it would be better in the long run if we work on upgrading to Hadoop 3.x since it has been pending for a long time. I was looking into this and saw that we have a branch https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x in presto-hadoop-apache2 which was created a couple of years back. @tdcmeehan Do you have an idea of why we did not merge it with master and release it? We can work on top of that branch unless there were some blockers due to which it wasn't released. |
I landed in this PR while checking if we have any plans for a Hadoop upgrade. After the Hadoop version upgrade, we can also add a required dependency for supporting Azure Data Lake file system. I see it's added from hadoop-2.8.0 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure-datalake |
That PR auto-generated and is not enough, there are significant changes that are needed to upgrade to Hadoop 3. I worked with Rajat to get the Hadoop 3.2.x upgrade changes into https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x a few months back. We should have almost all the changes from the Hadoop dependency side but we also need to look into updating the docker images that are used in the CI pipelines since they are currently based on Hadoop 2.7.4. There might be some presto code changes as well.
Yes, that was one of the aims of this effort when I started looking into it last year but it has got delayed due to prioritisation. We wanted to upgrade to 3.2.x since ADLS Gen 2 was added in Hadoop 3 and there have been a few requests for that as well. We should have a discussion on the plan and proceed from there. |
Merged with #21483 |