Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text visualisation type #1035

Closed
4 tasks done
RaoOfPhysics opened this issue Jul 19, 2024 · 10 comments
Closed
4 tasks done

Text visualisation type #1035

RaoOfPhysics opened this issue Jul 19, 2024 · 10 comments

Comments

@RaoOfPhysics
Copy link
Member

RaoOfPhysics commented Jul 19, 2024

  • Text view type
  • Implement Drawable
  • Implement Reflect
  • “Standalone” entry point for demo (remove from develop)

See also:


@rolyp and I have had a few thoughts on this, and I’m attempting here to note some of those in a way that hopefully makes sense.

This was partly inspired by my doctoral thesis (a-ch.in/ty-a#research), where, rather than write numbers by hand, I “injected” the results of calculations straight into the text. For example, the source R Markdown file (.Rmd) in question had the following text+code:

The gender breakdown of the respondents[^gender] is listed in Table \@ref(tab:gender-breakdown), together with that for the total number of scientists in the CMS collaboration.[^missing-gender]
Therefore, `r round(cms_male_response_count/cms_total_male*100,2)`% of all male CMS scientists and `r round(cms_female_response_count/cms_total_female*100,2)`% of all female CMS scientists responded to the survey.
The differences in the response rates are significant, as determined by the Chi-squared test without the Yates’s correction (_p_-value: `r round(prop$p.value,4)`; 95% CI: `r round(prop$conf.int,4)`).

[^gender]: At the time of collecting these data, CMS only allowed its members the choice of one of two genders, based on their passports or national ID cards.
I am not aware if this policy has changed since them.

[^missing-gender]: One of the scientists had an empty gender field in the database.

The output looks like this:

Screenshot 2024-07-19 at 15-09-56 Particle physics and public engagement a match made in minuscule matter - CERN-THESIS-2022-306 pdf

In another instance, I had three statistical tests run thrice to confirm if the dataset in question was suitable for the intended analysis. Similar text would need to explain this each time. Rather than write the sentences and numbers by hand, I wrote a set of functions in R, one for each test, which would output the appropriate text whether the test passed or failed, and would include the calculated numbers corresponding to the test:

# ====== Return statements for EFA ====== #

## KMO

report_kmo <- function(category) {
    structure <- parameters::check_factorstructure(category)
    KMO <- structure$KMO$MSA %>% round(2)
    if (KMO < 0.5) {
        glue::glue(“The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that factor analysis is likely to be inappropriate (KMO = {KMO}).”)
    } else {
        glue::glue(“The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that the data seem appropriate for factor analysis (KMO = {KMO}).”)
    }
}

## Bartlett’s test for sphericity

report_sphericity <- function(category) {
    structure <- parameters::check_factorstructure(category)
    chisq <- structure$sphericity$chisq %>% round(2)
    dof <- structure$sphericity$dof
    p_val <- structure$sphericity$p
    p_formatted <- insight::format_p(p_val)
    if (p_val < 0.001) {
        glue::glue(“Bartletts test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis ($\\chi$^2^({dof}) = {chisq}, {p_formatted}).”)
    } else {
        glue::glue(“Bartletts test of sphericity suggests that there is not enough significant correlation in the data for factor analysis ($\\chi$^2^({dof}) = {chisq}, {p_formatted}).”)
    }
}

## Cronbach’s alpha

report_alpha <- function(category, iterations = 50) {
    set.seed(19480717)
    alpha_all <- psych::alpha(category, n.iter = iterations, check.keys = TRUE)
    alpha_df <- alpha_all$boot.ci %>% as_tibble()
    alpha_lower <- alpha_df[1,1] %>% round(2)
    alpha_median <- alpha_df[2,1] %>% round(2)
    alpha_upper <- alpha_df[3,1] %>% round(2)
    glue::glue(“Cronbachs $\\alpha$, based on {iterations} iterations, is {alpha_median} (lower: {alpha_lower}; upper: {alpha_upper}).”)
}

Then, wherever I had to include the results, I added the following to the R Markdown file (.Rmd) of the analysis chapter:

```{r benefits-checks}
#| cache: TRUE
kmo_benefits <- report_kmo(benefits)
sphericity_benefits <- report_sphericity(benefits)
alpha_benefits <- report_alpha(benefits)
```

1. `r kmo_benefits`
1. `r sphericity_benefits`
1. `r alpha_benefits`

The output of that particular bit of code appears in the PDF as follows:
Screenshot 2024-07-19 at 14-57-33 Particle physics and public engagement a match made in minuscule matter - CERN-THESIS-2022-306 pdf

(Other instances are on p. 82 and p. 86.)


What we are proposing is the following:

  1. A “text” type of “visualisation” that can be inserted into prose, as shown by the first example above. So, you could have backticks with the keyword fluid with the “visualisation” type and parameters inserted there, which will be converted into the calculated figure in the displayed HTML file: `fluid <some code>`. This will need a bit of fluid code similar to the R package nombre, which converts from numbers to text and vice versa: https://nombre.rossellhayes.com/.
  2. The “text” visualisation needs to be linked to other visualisations where relevant, and when clicking on a data point in a visualisation, it should highlight all of the relevant text sentences/paragraphs (possibly using <mark>). When selecting specific points in the visualisation, the text should adapt itself based on the underlying data. This can be used for example for dynamic captions, but also for longer paragraphs.
    • A related idea is to have linked visualisations “appear” in the sidebar, so users don’t have to scroll back and forth to and from the relevant image.
@RaoOfPhysics
Copy link
Member Author

Related to: explorable-viz/fluid-examples#10

@RaoOfPhysics
Copy link
Member Author

Use AI instead of hand-crafting prose: https://github.com/explorable-viz/research-strategy/issues/165

@rolyp rolyp changed the title [Feature] “Text” as a type of visualisation that can be inserted into prose Text visualisation type Jul 19, 2024
@rolyp rolyp added this to Fluid Jul 19, 2024
@github-project-automation github-project-automation bot moved this to Proposed in Fluid Jul 19, 2024
@rolyp rolyp moved this from Proposed to Planned in Fluid Jul 19, 2024
@rolyp rolyp added this to the v0.7.5 Summer Internships 2024 milestone Jul 19, 2024
@rolyp
Copy link
Collaborator

rolyp commented Jul 31, 2024

@JosephBond This is the new visualisation type we’ll need to add – I think it can be based on similar principles to our existing viz types, but simpler. Initially it won’t be easy to use, as you’ll need to have a specific uniquely named div for each Text element, but hopefully there will be some relatively easy things we can do to improve this.

@rolyp rolyp moved this from Planned to Pending in Fluid Jul 31, 2024
@JosephBond
Copy link
Collaborator

JosephBond commented Aug 1, 2024

Going to sketch some of my design ideas here over today and tomorrow.

  • Add constructor for DisplayText type
    • need to store some sort of sequential type with regular text and then values (first)
    • Start off, no proper parser, instead on purescript side List (Either Text Val)
    • Second step, make the text interact-able on the JS side
    • Then, need to adapt type of expressions representable in the DisplayText, so we can do some mapping of values to natural language expressions
    • Then look at usability:
    • What kind of suggestions do we want? This is where the copilot style stuff could come in
    • Instead of building pages using div's with appropriate ID's, we are going to want a more usable mechanism to provide these things
    • Instead of a list of eithers, we want DisplayText { } to be able to use a backquote "`quote" mechanism like in Lisp, that means invoking the parser on quoted expressions a second time, within the scope of evaluating the DisplayText { }

@JosephBond
Copy link
Collaborator

Whilst in theory this is "simpler" than other visualizations, it's actually not simpler at all. We need some sort of polymorphism in the DisplayText type, or to at some point get all values that are present, and convert them into strings. It's entirely unclear where to do this because the way we pack/unpack types is really opaque

@RaoOfPhysics
Copy link
Member Author

From my private message to @JosephBond:


OK, so here’s my understanding. The inline chunks on their own are a bit more limited than what we’re trying to do. Most of the text is pre-written with the exception of what is being injected between the inline backticks.

I think these are the right bits of relevant code:

Not exactly sure how it works, but here is some explanation in the R Markdown book: https://bookdown.org/yihui/rmarkdown/r-code.html

Also worth poking at the Knitr book: https://yihui.org/knitr/

@rolyp
Copy link
Collaborator

rolyp commented Aug 8, 2024

@JosephBond I’ve had a very quick scan of the above. We can chat more later today, but perhaps we should conceptually separate “atomic visual elements” (visualisations and fragments of text) from the top-level document structure (which might be a sequence of such things, in a literate programming sort of style). For now, I think we can concentrate on the former. We already have visualisations which are computed from data, so we “just” need to add text computed from data. Then we can insert those text elements into existing HTML documents, just as we currently do with visualisations.

I think your List (Text + Val) type above is closer to the latter, and would allow the entire content of a web page to be a rendered Fluid value containing both text and graphics (the literate programming “document”). That’s an important perspective to keep in mind, but hopefully we can avoid having to design it right now.

@rolyp rolyp moved this from Pending to In Progress in Fluid Aug 8, 2024
@rolyp
Copy link
Collaborator

rolyp commented Aug 8, 2024

As an example adapted from @RaoOfPhysics‘s report_sphericity code, suppose the index.html contained:

<p>Bartlett’s test of sphericity suggests that there is <div id="sig-123">?</div> correlation in ... 

We could then (naively) compute a Text element in Fluid using the following library function:

let sufficient n threshold =
    Text(if n >= threshold then "sufficient" else "insufficient")

or even:

let sufficient n threshold =
    let s = "sufficient" in Text(if n >= threshold then s else "in" ++ s)

which when plugged into the div with id sig-123, would be rendered as (selectable) text to create the final HTML.

@rolyp
Copy link
Collaborator

rolyp commented Aug 8, 2024

@JosephBond Started capturing a few subtasks in the issue body.

@rolyp
Copy link
Collaborator

rolyp commented Aug 12, 2024

@JosephBond Have extracted Examples to a new task, to give us a place to capture some initial ideas..

@rolyp rolyp closed this as completed Aug 21, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Recently completed in Fluid Aug 21, 2024
@rolyp rolyp moved this from Recently completed to Done in Fluid Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants