Skip to content

Commit

Permalink
closes #563; closes #564; closes #565; closes #566; closes #567; closes
Browse files Browse the repository at this point in the history
#568; closes #569; closes #570; closes #571; closes #572; closes #573; closes #574
  • Loading branch information
IndrajeetPatil committed Apr 21, 2021
1 parent 22e7a84 commit c76f2a1
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 61 deletions.
52 changes: 26 additions & 26 deletions paper/paper.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,21 @@ knitr::opts_chunk$set(

Graphical displays can reveal problems in a statistical model that might not be
apparent from purely numerical summaries. Such visualizations can also be
helpful for the reader to evaluate validity of a model if the said analysis is
reported in a scholarly publication/report. But, given the onerous costs
involved, researchers can avoid preparing information-rich graphics and
exploring several statistical approaches/tests available. The `ggstatsplot`
package in R programming language [@base2021] provides a one-line syntax to
create densely informative `ggplot2`-based visualizations with the results from
statistical analysis embedded in the visualization itself. In doing so, the
package helps researchers adopt a **rigorous, reliable, and robust** data
exploratory and reporting workflow.
helpful for the reader to evaluate the validity of a model if it is reported in
a scholarly publication/report. But, given the onerous costs involved,
researchers can avoid preparing information-rich graphics and exploring several
statistical approaches/tests available. The `ggstatsplot` package in R
programming language [@base2021] provides a one-line syntax to enrich
`ggplot2`-based visualizations with the results from statistical analysis
embedded in the visualization itself. In doing so, the package helps researchers
adopt a rigorous, reliable, and robust data exploratory and reporting
workflow.

# Statement of Need

In a typical data analysis workflow, data visualization and statistical modeling
are two different phases: visualization informs modeling, and modeling in its
turn can suggest a different visualization method, and so on and so forth
are two different phases: visualization informs modeling, and in turn, modeling
can suggest a different visualization method, and so on and so forth
[@wickham2016r]. The central idea of `ggstatsplot` is simple: combine these two
phases into one in the form of an informative graphic with statistical details.

Expand All @@ -70,19 +70,18 @@ library(ggstatsplot)
ggbetweenstats(penguins, species, body_mass_g)
```
As can be seen, with a **single** line of code, the function produces details
about descriptive statistics, inferential statistics, effect size estimate and
its uncertainty, pairwise comparisons, Bayesian hypothesis testing, Bayesian
As can be seen, with a single line of code, the function produces details about
descriptive statistics, inferential statistics, effect size estimate and its
uncertainty, pairwise comparisons, Bayesian hypothesis testing, Bayesian
posterior estimate and its uncertainty. Moreover, these details are juxtaposed
with informative and well-labeled visualizations. The defaults are designed to
follow best practices in **both** data visualization [@cleveland1985;
follow best practices in both data visualization [@cleveland1985;
@grant2018data; @healy2018data; @tufte2001; @wilke2019fundamentals] and
(Frequentist/Bayesian) statistical reporting [@apa2019; @van2020jasp]. Without
`ggstatsplot`, getting these statistical details and customizing a plot would
require significant amount of time and effort In other words, this package
removes the trade-off often faced by researchers between ease versus
thoroughness of exploring data and further cements good data
sanitation/exploration habits.
removes the trade-off often faced by researchers between ease and thoroughness
of data exploration and further cements good data exploration habits.
Internally, data cleaning is carried out using `tidyverse` [@Wickham2019], while
statistical analysis is carried out via `statsExpressions` [@Patil2021] and
Expand All @@ -105,14 +104,14 @@ c. highlights the importance of the effect by providing effect size measures by
d. provides an easy way to evaluate *absence* of an effect using Bayes factors,
e. encourages researchers/readers to evaluate statistical assumptions of a model
in the context of the underlying data (Figure 2),
e. encourages researchers and readers to evaluate statistical assumptions of a
model in the context of the underlying data (Figure 2),
f. is easy and simple enough that someone with little-to-no coding experience
can use it without making an error and may even encourage beginners to
programmatically (instead of using GUI software) analyze data.
programmatically analyze data, instead of using GUI software.
```{r reporting, echo=FALSE, fig.cap="Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretive context for the reported statistics."}
```{r reporting, echo=FALSE, fig.cap="Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretative context for the reported statistics."}
knitr::include_graphics("reporting.png")
```

Expand All @@ -130,11 +129,12 @@ collection of statistical analyses and visualizations.

`ggstatsplot` is licensed under the GNU General Public License (v3.0), with all
source code stored at [GitHub](https://github.com/IndrajeetPatil/ggstatsplot/).
In the spirit of honest and open science, requests/tips for fixes, feature
updates, as well as general questions and concerns via direct interaction with
contributors and developers are encouraged by filing an
In the spirit of honest and open science, requests and suggestions for fixes,
feature updates, as well as general questions and concerns are encouraged via
direct interaction with contributors and developers by filing an
[issue](https://github.com/IndrajeetPatil/ggstatsplot/issues) while respecting
[*Contribution Guidelines*](https://indrajeetpatil.github.io/ggstatsplot/CONTRIBUTING.html).
[*Contribution
Guidelines*](https://indrajeetpatil.github.io/ggstatsplot/CONTRIBUTING.html).

# Acknowledgements

Expand Down
4 changes: 2 additions & 2 deletions paper/paper.aux
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
\newlabel{benefits}{{}{2}{Benefits}{section*.4}{}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{Future Scope}{2}{section*.6}\protected@file@percent }
\newlabel{future-scope}{{}{2}{Future Scope}{section*.6}{}}
\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretive context for the reported statistics.\relax }}{3}{figure.caption.5}\protected@file@percent }
\newlabel{fig:reporting}{{2}{3}{Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretive context for the reported statistics.\relax }{figure.caption.5}{}}
\@writefile{lof}{\defcounter {refsection}{0}\relax }\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretative context for the reported statistics.\relax }}{3}{figure.caption.5}\protected@file@percent }
\newlabel{fig:reporting}{{2}{3}{Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretative context for the reported statistics.\relax }{figure.caption.5}{}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{Licensing and Availability}{3}{section*.7}\protected@file@percent }
\newlabel{licensing-and-availability}{{}{3}{Licensing and Availability}{section*.7}{}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{Acknowledgements}{3}{section*.8}\protected@file@percent }
Expand Down
12 changes: 6 additions & 6 deletions paper/paper.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2020) (preloaded format=xelatex 2021.3.22) 5 APR 2021 09:45
This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2020) (preloaded format=xelatex 2021.3.22) 21 APR 2021 14:51
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
Expand Down Expand Up @@ -1153,7 +1153,7 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) \addtolength{\topmargin}{-0.95425pt}.

LaTeX Font Info: Font shape `TU/lmss/m/it' in size <8> not available
(Font) Font shape `TU/lmss/m/sl' tried instead on input line 372.
(Font) Font shape `TU/lmss/m/sl' tried instead on input line 371.
[1

]
Expand Down Expand Up @@ -1210,10 +1210,10 @@ Package logreq Info: Writing requests to 'paper.run.xml'.

)
Here is how much of TeX's memory you used:
34928 strings out of 478922
702613 string characters out of 5866020
1508855 words of memory out of 5000000
54220 multiletter control sequences out of 15000+600000
34927 strings out of 478922
702590 string characters out of 5866020
1508857 words of memory out of 5000000
54219 multiletter control sequences out of 15000+600000
410294 words of font info for 74 fonts, out of 8000000 for 9000
14 hyphenation exceptions out of 8191
95i,13n,103p,1194b,804s stack positions out of 5000i,500n,10000p,200000b,80000s
Expand Down
54 changes: 27 additions & 27 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ authors:
affiliations:
- name: Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany
index: 1
date: "2021-04-05"
date: "2021-04-21"
bibliography: paper.bib
---

Expand All @@ -25,21 +25,21 @@ bibliography: paper.bib

Graphical displays can reveal problems in a statistical model that might not be
apparent from purely numerical summaries. Such visualizations can also be
helpful for the reader to evaluate validity of a model if the said analysis is
reported in a scholarly publication/report. But, given the onerous costs
involved, researchers can avoid preparing information-rich graphics and
exploring several statistical approaches/tests available. The `ggstatsplot`
package in R programming language [@base2021] provides a one-line syntax to
create densely informative `ggplot2`-based visualizations with the results from
statistical analysis embedded in the visualization itself. In doing so, the
package helps researchers adopt a **rigorous, reliable, and robust** data
exploratory and reporting workflow.
helpful for the reader to evaluate the validity of a model if it is reported in
a scholarly publication/report. But, given the onerous costs involved,
researchers can avoid preparing information-rich graphics and exploring several
statistical approaches/tests available. The `ggstatsplot` package in R
programming language [@base2021] provides a one-line syntax to enrich
`ggplot2`-based visualizations with the results from statistical analysis
embedded in the visualization itself. In doing so, the package helps researchers
adopt a rigorous, reliable, and robust data exploratory and reporting
workflow.

# Statement of Need

In a typical data analysis workflow, data visualization and statistical modeling
are two different phases: visualization informs modeling, and modeling in its
turn can suggest a different visualization method, and so on and so forth
are two different phases: visualization informs modeling, and in turn, modeling
can suggest a different visualization method, and so on and so forth
[@wickham2016r]. The central idea of `ggstatsplot` is simple: combine these two
phases into one in the form of an informative graphic with statistical details.

Expand All @@ -59,19 +59,18 @@ ggbetweenstats(penguins, species, body_mass_g)
\includegraphics[width=1\linewidth]{paper_files/figure-latex/penguins-1} \caption{Example plot from the `ggstatsplot` package illustrates its philosophy of juxtaposing informative visualizations with details from statistical analysis. To see all supported plots and statistical analyses, see the package website: https://indrajeetpatil.github.io/ggstatsplot/}\label{fig:penguins}
\end{figure}

As can be seen, with a **single** line of code, the function produces details
about descriptive statistics, inferential statistics, effect size estimate and
its uncertainty, pairwise comparisons, Bayesian hypothesis testing, Bayesian
As can be seen, with a single line of code, the function produces details about
descriptive statistics, inferential statistics, effect size estimate and its
uncertainty, pairwise comparisons, Bayesian hypothesis testing, Bayesian
posterior estimate and its uncertainty. Moreover, these details are juxtaposed
with informative and well-labeled visualizations. The defaults are designed to
follow best practices in **both** data visualization [@cleveland1985;
follow best practices in both data visualization [@cleveland1985;
@grant2018data; @healy2018data; @tufte2001; @wilke2019fundamentals] and
(Frequentist/Bayesian) statistical reporting [@apa2019; @van2020jasp]. Without
`ggstatsplot`, getting these statistical details and customizing a plot would
require significant amount of time and effort In other words, this package
removes the trade-off often faced by researchers between ease versus
thoroughness of exploring data and further cements good data
sanitation/exploration habits.
removes the trade-off often faced by researchers between ease and thoroughness
of data exploration and further cements good data exploration habits.

Internally, data cleaning is carried out using `tidyverse` [@Wickham2019], while
statistical analysis is carried out via `statsExpressions` [@Patil2021] and
Expand All @@ -94,15 +93,15 @@ c. highlights the importance of the effect by providing effect size measures by

d. provides an easy way to evaluate *absence* of an effect using Bayes factors,

e. encourages researchers/readers to evaluate statistical assumptions of a model
in the context of the underlying data (Figure 2),
e. encourages researchers and readers to evaluate statistical assumptions of a
model in the context of the underlying data (Figure 2),

f. is easy and simple enough that someone with little-to-no coding experience
can use it without making an error and may even encourage beginners to
programmatically (instead of using GUI software) analyze data.
programmatically analyze data, instead of using GUI software.

\begin{figure}
\includegraphics[width=1\linewidth]{reporting} \caption{Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretive context for the reported statistics.}\label{fig:reporting}
\includegraphics[width=1\linewidth]{reporting} \caption{Comparing the 'Standard' approach of reporting statistical analysis in a publication/report with the 'ggstatsplot' approach of reporting the same analysis next to an informative graphic. Note that the results described in the 'Standard' approach are about the 'Dinosaur' dataset plotted on the right. Without the accompanying visualization, it is hard to evaluate the validity of the results. The ideal reporting practice will be a hybrid of these two approaches where the plot contains both the visual and numerical summaries about a statistical model, while the narrative provides interpretative context for the reported statistics.}\label{fig:reporting}
\end{figure}

# Future Scope
Expand All @@ -119,11 +118,12 @@ collection of statistical analyses and visualizations.

`ggstatsplot` is licensed under the GNU General Public License (v3.0), with all
source code stored at [GitHub](https://github.com/IndrajeetPatil/ggstatsplot/).
In the spirit of honest and open science, requests/tips for fixes, feature
updates, as well as general questions and concerns via direct interaction with
contributors and developers are encouraged by filing an
In the spirit of honest and open science, requests and suggestions for fixes,
feature updates, as well as general questions and concerns are encouraged via
direct interaction with contributors and developers by filing an
[issue](https://github.com/IndrajeetPatil/ggstatsplot/issues) while respecting
[*Contribution Guidelines*](https://indrajeetpatil.github.io/ggstatsplot/CONTRIBUTING.html).
[*Contribution
Guidelines*](https://indrajeetpatil.github.io/ggstatsplot/CONTRIBUTING.html).

# Acknowledgements

Expand Down
Binary file modified paper/paper.pdf
Binary file not shown.
Binary file modified paper/paper_files/figure-latex/penguins-1.pdf
Binary file not shown.

2 comments on commit c76f2a1

@lintr-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-raw/Titanic_full.R:16:10: warning: 1:nrow(...) is likely to be wrong in the empty edge case, use seq_len.

df[rep(1:nrow(df), rep), ]
         ^

R/ggbarstats.R:44:1: style: functions should have cyclomatic complexity of less than 15, this has 16.

ggbarstats <- function(data,
^

R/ggbetweenstats.R:165:1: style: functions should have cyclomatic complexity of less than 15, this has 24.

ggbetweenstats <- function(data,
^

R/ggcoefstats.R:133:1: style: functions should have cyclomatic complexity of less than 15, this has 43.

ggcoefstats <- function(x,
^

R/ggcorrmat.R:97:1: style: functions should have cyclomatic complexity of less than 15, this has 18.

ggcorrmat <- function(data,
^

R/ggpiestats.R:71:1: style: functions should have cyclomatic complexity of less than 15, this has 29.

ggpiestats <- function(data,
^

R/ggscatterstats.R:96:1: style: functions should have cyclomatic complexity of less than 15, this has 16.

ggscatterstats <- function(data,
^

R/ggwithinstats.R:75:1: style: functions should have cyclomatic complexity of less than 15, this has 27.

ggwithinstats <- function(data,
^

tests/testthat/test-ggbetweenstats.R:91:13: style: Commented code should be removed.

#       bold("only significant")
            ^~~~~~~~~~~~~~~~~~~~~~~~

tests/testthat/test-ggwithinstats.R:176:17: style: Commented code should be removed.

#       bold("only significant")
                ^~~~~~~~~~~~~~~~~~~~~~~~

@lintr-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-raw/Titanic_full.R:16:10: warning: 1:nrow(...) is likely to be wrong in the empty edge case, use seq_len.

df[rep(1:nrow(df), rep), ]
         ^

R/ggbarstats.R:44:1: style: functions should have cyclomatic complexity of less than 15, this has 16.

ggbarstats <- function(data,
^

R/ggbetweenstats.R:165:1: style: functions should have cyclomatic complexity of less than 15, this has 24.

ggbetweenstats <- function(data,
^

R/ggcoefstats.R:133:1: style: functions should have cyclomatic complexity of less than 15, this has 43.

ggcoefstats <- function(x,
^

R/ggcorrmat.R:97:1: style: functions should have cyclomatic complexity of less than 15, this has 18.

ggcorrmat <- function(data,
^

R/ggcorrmat.R:159:3: warning: local variable ‘corr.nature’ assigned but may not be used

corr.nature <- ifelse(isTRUE(partial), "correlation (partial):", "correlation:")
  ^~~~~~~~~~~

R/ggcorrmat.R:191:3: warning: local variable ‘getmode’ assigned but may not be used

getmode <- function(v) {
  ^~~~~~~

R/ggpiestats.R:71:1: style: functions should have cyclomatic complexity of less than 15, this has 29.

ggpiestats <- function(data,
^

R/ggscatterstats.R:96:1: style: functions should have cyclomatic complexity of less than 15, this has 16.

ggscatterstats <- function(data,
^

R/ggwithinstats.R:75:1: style: functions should have cyclomatic complexity of less than 15, this has 27.

ggwithinstats <- function(data,
^

tests/testthat/test-ggbetweenstats.R:91:13: style: Commented code should be removed.

#       bold("only significant")
            ^~~~~~~~~~~~~~~~~~~~~~~~

tests/testthat/test-ggwithinstats.R:176:17: style: Commented code should be removed.

#       bold("only significant")
                ^~~~~~~~~~~~~~~~~~~~~~~~

Please sign in to comment.