Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

603: Always generate secondary instance for selects #614

Merged
merged 20 commits into from
Jul 31, 2023

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Jul 6, 2022

Closes enketo/enketo#216

Interesting git history from 2012: a change to itemsets for everything, then reverted the next day for backwards compatibility.

Why is this the best possible solution? Were any other approaches considered?

Discussed in enketo/enketo#216 and references. Lots of changes required but generally explained in commit messages or new docstrings.

What are the regression risks?

Risk that some XForm consuming apps may rely on selects not always generating itemsets. Risk that since there are a lot of code changes there may be bugs, although tests were added / updated.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

It's not a user-facing so probably not.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run nosetests and verified all tests pass
  • run black pyxform tests to format code
  • verified that any code or assets from external sources are properly credited in comments

- survey.py: previous code would output position-based and
  reference-based translations (e.g. mylist-0 and choice:label), so
  it now skips reference-based translations for select_types.
- survey_element.py: related to the above change, test_xform2json.py
  test_convert_toJSON_multi_language failed due to circular reference.
  Change traverses translations to remove offending objects.
- pyxform_test_case.py: added the test case number to the debug/fail
  output, to help when there are many XPath expressions.
- numerous test fixes to accommodate updated choice output.
- now covered by tests/xpath_helpers/choices.py
- The test was broken by changes to choices processing (XLSForm#603).
- The test conditions seem to relate only to the instance_xmlns feature.
- The new test asserts the same using the pyxformtestcase pattern.
- builder.py: add type hint, remove unnecessary dict
- xform2json.py:
  - _set_survey_name now handles multiple instances
  - _set_submission_info now sets itemset details from secondary
     instance, and attempts to parse choice filters
  - _set_translations now returns value to be set in __init__()
  - _get_text_from_translation now does media comparison as a list,
    since a dictkeysview is not a list
  - add _get_choices to include choices in internal xform representation
- xls2json.py
  - skip adding choice lists that aren't used in the xform to avoid
    extra unnecessary data. Maybe worth a warning or maybe not.
  - add secondary instance for table-list appearance, but reject it if
    it was being added with a choice filter (not supported).
- xform2json_test.py:
  - expand test_load_from_dump variables for easier debugging
  - add test cases for choice filters, field-list and table-list
  - these test cases can be refactored to PyxformTestCase at some stage
- xls2xform_tests.py: use equivalent test method instead of keyword
- update expected XML from inline choices to secondary itemset for the
  following (these can be refactored to XPath PyxformTestCase):
  - test_table_list.py
  - repeat_date_test.xml
  - xml_escaping.xml
- reasonably confident that these are the expected types
- section should import SurveyElement from where it is defined.
- A NotImplementedError is commonly used to indicate a base method that
  should be implemented in child classes, and it is used in
  SurveyElement for xml_control(). The Option and Tag classes don't
  seem to need this method, so adding an implementation that raises this
  error is a bit clearer.
- move choices logic off to new function add_choices_info_to_question()
  - exclude select with appearance=search() from itemset (per comment)
  - new func for readability and to clarify what info is used to
    determine choices structures
- survey.py: skip adding translations for choices, instead of just for
  choice_filter. Use type "Option" to detect choice elements which
  should be more reliable. Move get_xpath lookup under if condition so
  it's only called if needed.
- xls2json.py: fix lookup of select type "select one external" which
  has an alias "select_one_external". Docs use the underscore alias
  but the implementation is looking for the spaces alias.
- xlsform_test_case/base.py: simplify reporter function
- xlsform_spec_test.py: simplify repetitive tests by adding test func.
  Ideally these should be re-implemented as PyxformTestCase, but they
  cover a lot of scenarios so for now just simplifying the structure.
- test_expected_output/*.xml: update expected XML to match current
  behaviour, generally: 1) update body to refer to instance, 2) add new
  instances in model, 3) add new translations in itext, 4) remove old
  itext from translations.
- Misc minor fixes:
  - utils.py: fix deprecation warning for get_sheet_by_name.
  - replace in-line regex with pre-compiled regex, and replace match
    (only looks at start of string) with search (looks at whole string).
  - add more constants for commonly-used internal strings so that it's
    easier to find where they are used, added usages to xls2json.py.
- The right number of empty rows would be removed afterwards anyway, but
  for someone reading the code later it'd be better if that number was
  as stated.
- yes_or_no_question and xml_tests updated for new itemset output for
  selects. The default_survey_sheet test expected no default_form_name.
- add type hints to workbook_to_json and parse_file_to_json.
@lognaturel
Copy link
Contributor

lognaturel commented May 8, 2023

Interesting git history from 2012: a change to itemsets for everything, then reverted the next day for backwards compatibility.

There are three reasons we've discussed:

Of those, I think the second point has the highest risk. We've had a post on the forum announcing this change for a year, though, so hopefully folks building downstream tools have had time to adapt.

Have you looked at or_other yet? I think for now it makes sense to inject an other choice. I haven't looked -- is that what the 2012 implementation did?

@tiritea
Copy link

tiritea commented May 8, 2023

sorry, I only just noticed this the other day (I get spammed by so many git updates I tend to tune them out... :-( Would it be possible to add a commandline option to continue to generate 'inline' select option when desired? I do actually currently rely on them because I havent completed parsing secondary instances in my (iOS) XForms parser yet. And given it is such a breaking change, it might still be nice to still support it (as deprecated), even if you have you now explicitly enable it via said option. Otherwise I guess I'll have to fork the current version and use that in the interim.

@lindsay-stevens
Copy link
Contributor Author

@lognaturel there's a few more translations test to update, then the last implementation thing to look at should be or_other. The search() appearance now generates an itemset - unrelated to this PR (before and after), search() works in Collect but not Enketo.

@lognaturel
Copy link
Contributor

More conversation with @tiritea on the forum.

The search() appearance now generates an itemset - unrelated to this PR (before and after)

Really. And not an itemset AND static choices, only an itemset? I could have sworn I checked that as part of my comment long ago. Well, that makes that easy at least.

search() works in Collect but not Enketo.

Yes, that's right.

@lindsay-stevens
Copy link
Contributor Author

@lognaturel I meant:

  1. The search() appearance now generates an itemset (due to this PR)
  2. unrelated to this PR (before and after), search() works in Collect but not Enketo.

@lognaturel
Copy link
Contributor

Got it, thanks!

@tiritea
Copy link

tiritea commented May 9, 2023

FWIW I dont support search() yet, either, so please dont do anything funky here on my account... (I'll deal with all that after I get secondary instances working).

- test_translations.py:
  - remove xpath helpers now in xpath_helpers/choices.py
  - update test xpath assertions to match itemset behaviour
- choices.py:
  - improve xpath helpers to not require specification of position
- builder.py:
  - add or_other choice to survey-level choice dict
  - add type annotations
- constants.py: add new constant for "or specify other" internal suffix
- xls2json.py: add error for choice_filter + or_other. Main reason I
  suppose is that it's unknown how to filter the "other" choice so the
  user might never see it.
- xlsform_spec_test.py:
  - fix or_other expected XML to include "other" and match sorted
    element names.
  - move or_other test into this testcase since it is the same as others
    there already.
- builder_tests.py: update expected output to match or_other output.
@lindsay-stevens lindsay-stevens marked this pull request as ready for review May 12, 2023 12:32
@lindsay-stevens
Copy link
Contributor Author

@lognaturel ready for review. It's covering the cases mentioned in enketo/enketo#216, and any tests that failed as a result of code changes. I haven't combed through every test case though, e.g. in case there's some way to avoid generating itemsets.

@lognaturel
Copy link
Contributor

lognaturel commented Jul 17, 2023

Finally spending some focused time on this after speaking to @lindsay-stevens about it at the ODK Summit. We would like to release this by mid-August at the latest.

My review checklist:

  • Verify impact on some real forms
    • WHO VA: 990kb before, 593kb after (va_before_after.zip). Tried on client, no obvious issues. Central doesn't flag any schema change. Skimmed diff which looks as expected.
    • All widgets: 44kb before, 35kb after. I quickly skimmed through the form in Collect and Enketo. In Enketo:
Thing Before After
Likert likert-old likert-new
Selects with media select-img-old select-img-new

Haven't looked into what's going on yet. Ideally pyxform output could be modified slightly to not get those differences but it might not be possible.

  • or_other
  • translations
    • has code to generate translation names been removed? No, still relevant for media
  • labels that use values from form
    • An example. I don't think there's much reason to do this anymore now that we have selects from repeats but maybe people do things like get dynamic values from unrolled repeats or things like that. I verified that Collect and Enketo are happy with <output>s in itext.
  • "skip adding choice lists that aren't used in the xform to avoid extra unnecessary data. Maybe worth a warning or maybe not." in 1238bd6, This is existing behavior. I've filed Choice lists are only included in the output if they are referenced in a select. #647 for consideration. @lindsay-stevens I'm interested in what you think.
  • media, long/short forms - Lots of tests with image and big-image on choices. No support for long/short forms yet.

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really thoughtfully done. There's just one section I really can't wrap my brain around and makes me nervous. I've marked it inline.

There's some Enketo rendering weirdness that I think we should look into. Maybe it's just the spacing in the pyxform output? See above.

I still want to look a little bit more at translations and long/short forms but overall I think this is very close to merge.

@@ -244,6 +244,13 @@ def to_json_dict(self):
result["children"] = []
for child in children:
result["children"].append(child.to_json_dict())
# Translation items with "output_context" have circular references.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_context is a way to save the node reference for cases where there are output nodes so that ${} references can be expanded? I tried removing this scary bit of code and test_convert_toJSON_multi_language did fail. Then I set all places where output_context is set to None and some tests in test_repeat failed. I don't understand why this works, though. I thought it would remove all output_context values for all survey elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is reached by a call to survey.to_json so it shouldn't affect form conversion to XML.

The purpose of output_context is to hold a reference to a SurveyElement, so that references like ${} can be expanded. In this case we're going from "internal PyForm" structures into JSON - so I don't know what the intended JSON representation of output_context should be (maybe a copy of the node dict?). Or maybe the JSON output should have the de-referenced output somewhere (e.g. a string containing XML). It seems like it needs a separate issue to design and implement, or document.

Anyway, the problem at hand is that the node reference is a SurveyElement, which has a reference to it's parent Survey, which has references to it's child SurveyElements, which have references to the parent Survey, and so on. To break out of that loop, output_context is deleted so that survey.to_json() can return a document - it just would have the original referencing strings like My fav fruit is ${fruit}, and de-referencing those is left to the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is reached by a call to survey.to_json so it shouldn't affect form conversion to XML.

Aha, thank you, that's great to know.

name_of_main_section: str = None,
sections: Dict[str, Dict] = None,
main_section: Dict[str, Any] = None,
id_string: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_string seems like it shouldn't be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sprinkled on this type annotation to help understand data flow. The id_string (and perhaps other params) could be mandatory, but then it seemed to make sense to refactor the body of the function to remove is None checks, etc., which could easily snowball into more refactoring.

# Reference to list name for data dictionary tools (ilri/odktools).
question["list_name"] = list_name
# Copy choices for data export tools (onaio/onadata).
# TODO: could onadata use the list_name to look up the list for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One challenge with not having a documented API for pyxform at all is that it's not clear how to put things on a deprecation path. Did you happen to also look at Kobo for any direct usage of this kind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various kobo repos (including kobocat, kpi, formpack) import pyxform. formpack uses workbook_to_json, but I didn't look in to how they use the data structures. formpack seems to be used by kpi, but I'm not sure how that relates to kobocat or other projects. kpi dependency spec has pyxform==1.9.0.

@lognaturel
Copy link
Contributor

lognaturel commented Jul 19, 2023

Ok, @lindsay-stevens, I'm ready to merge once the couple of things above are addressed! The only real blocker for me is that I'm concerned dropping output_context won't work in every case. I haven't come up with a failing example yet but it'd be helpful to at least understand that chunk of code a bit better. I'd also like to at least try to get the Enketo formatting looking ok.

@lindsay-stevens
Copy link
Contributor Author

@lognaturel thanks for the review! I have responded to the review comments.

About the Enketo issue, I don't see where the pyxform output formatting could cause this. There's no spacing added to the output e.g. near the jr://images/ values, or the coded values in the itemsets. Maybe there is a different code path in Enketo for looking up labels via jr:itext(itextId) + itemset vs. directly e.g. jr:itext('/widgets/grid_test/a:label')"? If Collect is not doing the spacing thing then that also points to an Enketo issue.

@lognaturel
Copy link
Contributor

lognaturel commented Jul 31, 2023

I have filed Enketo issues: enketo/enketo#29, enketo/enketo#28

The first is bad but I don't think it's bad enough to block merge. We can communicate the issues and encourage platforms that use pyxform and Enketo to update both at the same time.

We will release this as pyxform 2.0 because there is a compatibility/interface change.

@lognaturel lognaturel merged commit 129abe9 into XLSForm:master Jul 31, 2023
@lindsay-stevens lindsay-stevens deleted the pyxform-603 branch August 1, 2023 02:47
@lognaturel
Copy link
Contributor

We've identified a number of Enketo bugs as we have done quality assurance on this change. We're holding release until we can resolve those.

@lognaturel
Copy link
Contributor

lognaturel commented Aug 24, 2023

All Enketo bugs we know about are fixed! 🎉

But now a user has reported this strange error. I can reproduce it on master and can't reproduce on v1.12.1. @lindsay-stevens could you please take a look?

@lindsay-stevens
Copy link
Contributor Author

@lognaturel Great news about Enketo! I have opened a new PR to fix this error, which occurs when using or_other + translations + group or repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The file question type isn't working for the most recent version of Enketo
3 participants