Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remoteproc: fix management of non loadable segments #553

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

UmairKhanUET
Copy link
Contributor

@UmairKhanUET UmairKhanUET commented Feb 7, 2024

The elf loader assumes that the last ELF program segment will always be a LOAD type segment. I deduce this from the fact that the elf_load() function, when loading the remote ELF sections during the RPROC_LOADER_READY_TO_LOAD stage, compares the last load segment num to the total ELF sections to determine if the loading is complete and it can move to the next stage RPROC_LOADER_POST_DATA_LOAD. If the last program segment in the ELF is not of type LOAD, the last loaded LOAD segment never equals total ELF sections. This creates an error condition and the firmware loading fails. This patch fixes this issue by comparing the last loaded LOAD segment number with the max LOAD segment number in the ELF.

This PR fixes #552

Signed-off-by: Muhammad Umair Khan [email protected]

@UmairKhanUET
Copy link
Contributor Author

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

@arnopo
Copy link
Collaborator

arnopo commented Feb 7, 2024

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

The elf_loader.c was upstreamed before we put in place the checkpatch. That's why you can see some old stlyles occurrences.
Nevertheless, for new patch we expect that the coding rules are respected when possible.
So please fix it, unless you see a reason not to do it.

@UmairKhanUET
Copy link
Contributor Author

UmairKhanUET commented Feb 9, 2024

Can we take an exception to the UNNECESSARY_ELSE error here? Various other functions in the elf_loader.c use the same style as the one used in this PR.

The elf_loader.c was upstreamed before we put in place the checkpatch. That's why you can see some old stlyles occurrences. Nevertheless, for new patch we expect that the coding rules are respected when possible. So please fix it, unless you see a reason not to do it.

@arnopo I have resolved the warning. Please resume your review.

@arnopo
Copy link
Collaborator

arnopo commented Feb 9, 2024

Could you provide some debug trace of your issue, that we understand what you mean by "This creates an error condition and the firmware loading fails" ?
regarding the code seems that you falls into this condition : https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L574-L579

Copy link
Contributor

@edmooring edmooring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes might not be needed. Comments to clear up any misunderstanding on my part are welcome.

lib/remoteproc/elf_loader.c Show resolved Hide resolved
@UmairKhanUET
Copy link
Contributor Author

Code changes might not be needed. Comments to clear up any misunderstanding on my part are welcome.

@edmooring I believe I answered your concern in the other comment thread. I am re-requesting review. Feel free to leave further comments if you have more questions.

@UmairKhanUET
Copy link
Contributor Author

UmairKhanUET commented Feb 12, 2024

Could you provide some debug trace of your issue, that we understand what you mean by "This creates an error condition and the firmware loading fails" ? regarding the code seems that you falls into this condition : https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L574-L579

Yeah, you got it right @arnopo , it does land at the mentioned location. Technically, it shouldn't have landed here because it was already past the last LOAD segment in the previous iteration. This time now, it returns back from https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L578 with a valid da of the non-LOAD segment returned by the elf_next_load_segment() function, without setting the load state to RPROC_LOADER_POST_DATA_LOAD and with nmemsize = 0 (the nmemsize variable is updated to the segment length after the point of return https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/elf_loader.c#L581 )

As the da is valid, the calling function remoteproc_load() will attempt to load it and call remoteproc_mmap(). With nmemsize equal to 0, the error condition in in remoteproc_mmap() at line https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/remoteproc.c#L381 will hit. The pa will remain as METAL_BAD_PHYS which will make remoteproc_load() enter the error condition at https://github.com/OpenAMP/open-amp/blob/main/lib/remoteproc/remoteproc.c#L562 and hence the loading fails.

I am putting the trace log for further information:

metal: debug:     remoteproc_load: check remoteproc status
metal: debug:     remoteproc_load: open executable image
metal: debug:     remoteproc_load: check loader
metal: debug:     remoteproc_load: loading headers
metal: debug:     Loading ELF headering
metal: debug:     Loading ELF program header.
metal: debug:     Loading ELF section header.
metal: debug:     remoteproc_load, load header 0x0, 0x100, next 0xcbc98, 0x550
metal: debug:     Loading ELF section header.
metal: debug:     Loading ELF section header complete.
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbc98, 0x550, next 0xcbb25, 0x170
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbb25, 0x170, next 0xcbb25, 0x0
metal: debug:     remoteproc_load: load executable data
metal: debug:     segment: 2, total segs 4
metal: debug:     load data: da 0x60000000, offset 0xb8, len = 0x6500, memsize = 0x9b64, state 0x10802
metal: debug:     segment: 3, total segs 4
metal: debug:     load data: da 0x60009b64, offset 0x6664, len = 0x19c, memsize = 0x19c, state 0x10803
metal: debug:     cannot find more segment
metal: debug:     load data: da 0x60006194, offset 0x624c, len = 0x0, memsize = 0x0, state 0x10804
metal: error:     load failed, no mapping for 0x60006194.

With the changes under review here, the loading is successful:

metal: debug:     remoteproc_load: check remoteproc status
metal: debug:     remoteproc_load: open executable image
metal: debug:     remoteproc_load: check loader
metal: debug:     remoteproc_load: loading headers
metal: debug:     Loading ELF headering
metal: debug:     Loading ELF program header.
metal: debug:     Loading ELF section header.
metal: debug:     remoteproc_load, load header 0x0, 0x100, next 0xcbc98, 0x550
metal: debug:     Loading ELF section header.
metal: debug:     Loading ELF section header complete.
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbc98, 0x550, next 0xcbb25, 0x170
metal: debug:     Loading ELF shstrtab.
metal: debug:     remoteproc_load, load header 0xcbb25, 0x170, next 0xcbb25, 0x0
metal: debug:     remoteproc_load: load executable data
metal: debug:     segment: 2, last load seg 3
metal: debug:     load data: da 0x60000000, offset 0xb8, len = 0x6500, memsize = 0x9b64, state 0x10802
metal: debug:     segment: 3, last load seg 3
metal: debug:     load data: da 0x60009b64, offset 0x6664, len = 0x19c, memsize = 0x19c, state 0x20803
metal: debug:     load data: da 0xffffffffffffffff, offset 0x0, len = 0x0, memsize = 0x0, state 0x40803
metal: debug:     remoteproc_load, update resource table
metal: debug:     remoteproc_load: successfully load firmware

@arnopo
Copy link
Collaborator

arnopo commented Feb 15, 2024

Thanks @UmairKhanUET for the details and trace.
Has @edmooring mentioned this This code is pretty old and has never been matured.

Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

		nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
		phdr = elf_next_load_segment(*img_info, &nsegment, da,
					     noffset, &nsize, &nsegmsize);

		phnums = elf_phnum(*img_info);
		if (phdr) {
			*nlen = nsize;
			*nmemsize = nsegmsize;
			metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
		}

		if (nsegment == phnums) {
			if (phdr) {
 				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_POST_DATA_LOAD;
			} else {
 			        metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_LOAD_COMPLETE;
				      *da = RPROC_LOAD_ANYADDR;
		}

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

@UmairKhanUET
Copy link
Contributor Author

Thanks @UmairKhanUET for the details and trace. Has @edmooring mentioned this This code is pretty old and has never been matured.

Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

		nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
		phdr = elf_next_load_segment(*img_info, &nsegment, da,
					     noffset, &nsize, &nsegmsize);

		phnums = elf_phnum(*img_info);
		if (phdr) {
			*nlen = nsize;
			*nmemsize = nsegmsize;
			metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
		}

		if (nsegment == phnums) {
			if (phdr) {
 				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_POST_DATA_LOAD;
			} else {
 			        metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_LOAD_COMPLETE;
				      *da = RPROC_LOAD_ANYADDR;
		}

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

Thanks @arnopo for the feedback. I will try this out and post the results here.

@UmairKhanUET
Copy link
Contributor Author

Thanks @UmairKhanUET for the details and trace. Has @edmooring mentioned this This code is pretty old and has never been matured.
Regarding your commit, I wonder if we could not simplify this by refactoring the existing code instead of adding elf_max_load_phnum()

		nsegment = *load_state & ELF_NEXT_SEGMENT_MASK;
		phdr = elf_next_load_segment(*img_info, &nsegment, da,
					     noffset, &nsize, &nsegmsize);

		phnums = elf_phnum(*img_info);
		if (phdr) {
			*nlen = nsize;
			*nmemsize = nsegmsize;
			metal_log(METAL_LOG_DEBUG, "segment: %d, total segs %d\r\n", nsegment, phnums);
		}

		if (nsegment == phnums) {
			if (phdr) {
 				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_POST_DATA_LOAD;
			} else {
 			        metal_log(METAL_LOG_DEBUG, "no more segment to load\r\n");
				*load_state = (*load_state & (~RPROC_LOADER_MASK)) |
				      RPROC_LOADER_LOAD_COMPLETE;
				      *da = RPROC_LOAD_ANYADDR;
		}

Unfortunately I have no hardware to test the loader. Could you test it ( with probably few fixes) on your side and give me a feedback?

Thanks @arnopo for the feedback. I will try this out and post the results here.

Hi @arnopo ,
I tried your change and it works fine. I also stress tested it by adding two trailing non-LOAD program segments instead of one and it nicely caters this scenario as well. I have pushed the updates in this PR.

Copy link
Contributor

@edmooring edmooring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go.

@arnopo
Copy link
Collaborator

arnopo commented Feb 28, 2024

@UmairKhanUET
Before i merge it, please change the subject for something more explicit with a remoteproc: prefix
For instances "remoteproc: fix management of non loadable segments"

@UmairKhanUET UmairKhanUET changed the title elf loader: The rproc elf loading problem remoteproc: fix management of non loadable segments Feb 28, 2024
@UmairKhanUET
Copy link
Contributor Author

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

@arnopo
Copy link
Collaborator

arnopo commented Feb 28, 2024

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

unclear, I was speaking about the commit subject :-)

The elf loader assumes that the last ELF program segment will always
be a LOAD type segment. I deduce this from the fact that the elf_load()
function, when loading the remote ELF sections during the
RPROC_LOADER_READY_TO_LOAD stage, compares the last load segment num
to the total ELF sections to determine if the loading is complete and
it can move to the next stage RPROC_LOADER_POST_DATA_LOAD. If the last
program segment in the ELF is not of type LOAD, the last loaded LOAD
segment never equals total ELF sections. This creates an error
condition and the firmware loading fails. This patch fixes this issue
by comparing the last loaded LOAD segment number with the max LOAD
segment number in the ELF.

Signed-off-by: Umair Khan <[email protected]>
@UmairKhanUET
Copy link
Contributor Author

@UmairKhanUET Before i merge it, please change the subject for something more explicit with a remoteproc: prefix For instances "remoteproc: fix management of non loadable segments"

@arnopo Done

unclear, I was speaking about the commit subject :-)

@arnopo I hope I did it right this time :D

@arnopo arnopo merged commit 79b795e into OpenAMP:main Feb 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The remote ELF loading problem
3 participants