From 74024aaac78392e466ba043a859f6291864a2055 Mon Sep 17 00:00:00 2001 From: "CSDYPHI\\ckdck" Date: Fri, 18 Oct 2024 14:44:00 +0900 Subject: [PATCH] prefaring the 1.8.1.15 update - update the classifiers - update readme --- README.md | 93 +++++--------- statmanager/README.md | 277 +++++++++++++++++++++++++++++++++------- statmanager/__init__.py | 2 +- statmanager/setup.py | 11 +- 4 files changed, 267 insertions(+), 116 deletions(-) diff --git a/README.md b/README.md index 75a525a..aa667a7 100644 --- a/README.md +++ b/README.md @@ -11,73 +11,32 @@ ![logo](./doc/logo.png) ### Available Operating Systems + ### Availabe Python Versions + +
-# Statmanager-kr -### Open-source Statistical Package for Python based on the Pandas. -### Python과 Pandas 사용자를 위한 오픈소스 통계 패키지 - -# - -Statmanager-kr is developed for researchers, data scientists, psychologist, studends, and anyone who need statistical analysis to validate their hypothesis. The statmanager-kr aims to organize packages that are "convenient to use", "uncompliated to use", and "convenient to see results". The end goal of statmanager-kr is to be a simple and useful package that can be used by people who don't know much about Python and Pandas. - -Statmanager-kr은 가설을 검증하기 위해 통계 분석이 필요한 연구원, 데이터분석가, 심리학자, 학생 등을 위해 개발되었습니다. statmanager-kr은 사용이 쉽고, 복잡하지 않은 통계 패키지를 목표로 지속적으로 개발됩니다. statmanager-kr 개발의 최종 목표는 Python과 Pandas를 잘 알지 못하는 사람도 이용할 수 있는 매우 간편하면서도 유용한 통계 패키지를 만드는 것입니다. +Statmanager-kr is open-source statistical package for researchers, data scientists, psychologist, studends, and anyone who need statistical analysis. Statmanager-kr aims to be a user-friendly statistical package that can be easily used by people who unfamiliar with programming language. Currently, KOREAN and ENGLISH are supported. -현재 지원하는 언어 세팅은 한글영어입니다. ## Documentaion -[한글 공식 문서](https://cslee145.notion.site/cslee145/fd776d4f9a4f4c9db2cf1bbe60726971?v=3b2b237555fc4cd3a41a8da337d80c01) -[Official Documentation](https://cslee145.notion.site/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4) - - -## Notifications : -Source codes are available in the [Github respository](https://github.com/ckdckd145/statmanager-kr) -소스코드는 [깃헙 레포지토리](https://github.com/ckdckd145/statmanager-kr)에서 확인할 수 있습니다. - -For updates, please see [the notice in the documentation]((https://www.notion.so/cslee145/NOTICEs-4bb2177eeb0f412a81b8dbd3215058e6)) or the [Github release](https://github.com/ckdckd145/statmanager-kr/releases). -업데이트 내역은 정식 문서 내 [공지사항](https://www.notion.so/cslee145/NOTICEs-4bb2177eeb0f412a81b8dbd3215058e6) 혹은 [Github release](https://github.com/ckdckd145/statmanager-kr/releases)에서 확인하시기 바랍니다. - -### Contribution Guidelines - -Please check the [guidelines](https://www.notion.so/cslee145/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4#96a4e9547ae54a41928ff4114729f6c2) in official documentation. -공식 문서 내 기여 [가이드라인](https://www.notion.so/cslee145/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4#96a4e9547ae54a41928ff4114729f6c2)을 확인해주시기 바랍니다. - -Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions) to let me know if you have any questions, bugs you encounter, suggestions, etc. Of course, you can also email the developer directly. -궁금하신 점, 발생하는 버그, 제안 사항 등 모든 것은 [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions)을 활용해서 알려주시면 감사하겠습니다. 물론, 개발자에게 직접 이메일을 보내셔도 됩니다. +[Official documentation - Korean](https://cslee145.notion.site/fd776d4f9a4f4c9db2cf1bbe60726971?v=3b2b237555fc4cd3a41a8da337d80c01) +[Official Documentation - English](https://cslee145.notion.site/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4) -# -* [Quick Start with sample jupyter notebook file](https://github.com/ckdckd145/statmanager-kr/blob/main/test.ipynb) -* Available functions | 현재 사용 가능한 분석 - * [Read detailed instructions](https://www.notion.so/cslee145/Documentation-74a610c12881402d96dc5d1654f97433?pvs=4#be93db7f4159419fa73eb324d6567793) | [상세 사용법 열람](https://www.notion.so/cslee145/dded43262f784c70a37fddb11ec7c9d1?pvs=4#ef9a4aacd8b34b96bd7a4abdea4f5170) - 1. Normality assumption | 정규성 가정 - 2. Homoskedasticity assumption | 등분산성 가정 - 3. Reliability | 신뢰도 확인 - 4. Frequency analysis | 빈도분석 - 5. Correlation analysis | 상관분석 - 6. Comparison (2) | 차이비교 (2) - 7. Comparison (3) | 차이비교 (3) - 8. Regression - - -* Available functions to make figure or graph | 그래프 혹은 그림 제작에 활용되는 기능 - * P-P plot - * Q-Q plot - * Histogram - * Histogram (cumulative) - * Pointplot (within differences) - * Boxplot (between group difference) +## Source Code & Dependency +Source codes are available in the [Github respository](https://github.com/ckdckd145/statmanager-kr) #### Dependency * pandas @@ -88,29 +47,39 @@ Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discu * seaborn * XlsxWriter -It is recommended to use the latest versions of these libraries and packages to avoid unexpected errors. -예상치 못한 오류를 방지하기 위하여, 위 패키지 및 라이브러리는 항상 최신 버전으로 업데이트할 것을 권고합니다. +It is recommended to use the latest versions of these libraries and packages to avoid unexpected errors. -#### Recommendation -Using "Jupyter Notebook" is STRONGLY RECOMMENDED (Of course, statmanager-kr works just as well in a Python environment) -"주피터 노트북(Jupyter Notebook)" 사용을 강력하게 권고합니다. 물론, Python 환경에서도 statmanager-kr은 문제없이 작동합니다. +## Contribution Guidelines -#### Installing statmanager-kr - pip install statmanager-kr +Please check the [guidelines](https://www.notion.so/cslee145/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4#96a4e9547ae54a41928ff4114729f6c2) in official documentation. -#### Updating statmanager-kr - pip install statmanager-kr --upgrade +Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions) to let me know the questions, bugs, suggestions or anything. # Quick Start +[If you want to start with sample file, click this](https://github.com/ckdckd145/statmanager-kr/blob/main/test.ipynb) +[Read manual in documentation](https://www.notion.so/cslee145/Documentation-74a610c12881402d96dc5d1654f97433?pvs=4#be93db7f4159419fa73eb324d6567793) | + + +### Installation +```python +pip install statmanager-kr +``` + +### Update +```python +pip install statmanager-kr --upgrade +``` + ### Import ```Python import pandas as pd from statmanager import Stat_Manager -df = pd.read_csv('testdf.csv', index_col = 'id') +# use your data file instead of 'testdf.csv' +df = pd.read_csv('testdf.csv', index_col = 'id') sm = Stat_Manager(df, language = 'eng') ``` @@ -276,11 +245,11 @@ The main difference is that `Statmanager-kr` was developed with the goal of bein In conclusion, `Statmanager-kr` is a good package for researchers who lack programming experience and knowledge and want to see results quickly. [`Pingouin`](https://pingouin-stats.org/build/html/index.html), on the other hand, is a more suitable package for researchers with more programming experience and knowledge, who need a fine-tuned approach to each analysis method. -앞서 말했듯, `Statmanager-kr`은 Python과 같은 프로그래밍 언어에 익숙하지 않더라도, 가설을 검증하기 위한 통계 분석 방법을 유저 친화적으로 제공하기 위해 개발되었습니다. 이와 같이, 유저 친화적인 기능을 제공하는 유관 소프트웨어로는 대표적으로 [`Pingouin`](https://pingouin-stats.org/build/html/index.html)을 들 수 있습니다. -가장 큰 차이점은, `Statmanager-kr`은 프로그래밍 관련 지식이나 경험이 부족한 연구자도 사용할 수 있는 패키지를 목표로 개발되었다는 점입니다. 이를 위해 `Statmanager-kr`은 분석방법 별로 독립적인 메소드를 구현하기보다, 사용자가 언제든 하나의 메소드에 동일한 방식의 코드를 입력하여 통계 분석을 실행하고 결과를 얻을 수 있도록 설계되었습니다. [`Pingouin`](https://pingouin-stats.org/build/html/index.html)도 사용자 친화적인 특성을 공유하나, `Statmanager-kr`과 비교하였을 대에는 비교적 프로그래밍 경험과 지식이 많은 사용자에게 적합한 패키지입니다. 다만, 이러한 차이로 인해 `Statmanager-kr`은 매개 변수를 조정하여 분석 방법을 세밀하게 조율하는 기능을 지원할 수 없습니다. 반면, [`Pingouin`](https://pingouin-stats.org/build/html/index.html)은 매개 변수를 조정하여 보다 세심하고 적합한 결과를 확보하는 데 유용합니다. +## How to cite? -결론적으로, `Statmanager-kr`은 프로그래밍 경험과 지식이 부족하며, 빠르게 결과를 확인하고자 하는 연구자에게 적합한 패키지입니다. 반면, [`Pingouin`](https://pingouin-stats.org/build/html/index.html)은 프로그래밍 경험과 지식이 비교적 풍부하며, 각 분석 방법별로 세밀한 조정과 접근이 필요한 연구자에게 보다 적합한 패키지입니다. +For inserting the citations, please use this: +* Lee, C., (2024). Statmanager-kr: A User-friendly Statistical Package for Python in Pandas. Journal of Open Source Software, 9(102), 6642, https://doi.org/10.21105/joss.06642 ## Development: Changseok Lee diff --git a/statmanager/README.md b/statmanager/README.md index 2638bc5..6edd0b1 100644 --- a/statmanager/README.md +++ b/statmanager/README.md @@ -1,56 +1,42 @@ -![logo](https://github.com/ckdckd145/statmanager-kr/blob/main/doc/logo.png?raw=true) - -# statmanager-kr -### Open-source statistical package for Python based on the Pandas. -### Python과 Pandas 사용자를 위한 오픈소스 통계 패키지 -# +[![PyPI version](https://badge.fury.io/py/statmanager-kr.svg)](https://badge.fury.io/py/statmanager-kr) +[![license](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/ckdckd145/statmanager-kr/blob/main/LICENSE) +[![status](https://joss.theoj.org/papers/d88c1a10e30fbfc39104534970afcd23/status.svg)](https://joss.theoj.org/papers/d88c1a10e30fbfc39104534970afcd23) + + + + -Especially for researchers, data scientists, psychologist, students, and anyone who interested in conducting hypothesis testing. The statmanager-kr aims to organize packages that are "convenient to use", "uncompliated to use", and "convenient to see results". The end goal** **of statmanager-kr is to be a simple and useful package that can be used by people who don't know much about Python and Pandas. -Pandas를 사용하며, 가설 검증에 대해 관심을 갖는 연구원, 데이터분석가, 심리학자, 학생 등을 위합니다. statmanager-kr은 사용하기 쉽고, 사용이 복잡하지 않으며, 결과를 확인하기에 편리한 패키지 구성을 목표로 개발됩니다. statmanager-kr 개발의 최종 목표는 Python과 Pandas를 잘 알지 못하는 사람도 이용할 수 있는 매우 간편하면서도 유용한 패키지를 만드는 것입니다. +![logo](../doc/logo.png) -Currently, KOREAN and ENGLISH are supported. -현재 지원하는 언어 세팅은 한글영어입니다. +### Available Operating Systems + + + + -## Documentaion -[한글 문서](https://cslee145.notion.site/cslee145/fd776d4f9a4f4c9db2cf1bbe60726971?v=3b2b237555fc4cd3a41a8da337d80c01) -[English Documentation](https://cslee145.notion.site/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4) +### Availabe Python Versions + + + -## Notification : -Source codes are available in the [Github respository](https://github.com/ckdckd145/statmanager-kr) -소스코드는 [깃헙 레포지토리](https://github.com/ckdckd145/statmanager-kr)에서 확인할 수 있습니다. +
-For updates, please see [the notice in the documentation]((https://www.notion.so/cslee145/NOTICEs-4bb2177eeb0f412a81b8dbd3215058e6)) or the [Github release](https://github.com/ckdckd145/statmanager-kr/releases). -업데이트 내역은 정식 문서 내 [공지사항](https://www.notion.so/cslee145/NOTICEs-4bb2177eeb0f412a81b8dbd3215058e6) 혹은 [Github release](https://github.com/ckdckd145/statmanager-kr/releases)에서 확인하시기 바랍니다. +Statmanager-kr is open-source statistical package for researchers, data scientists, psychologist, studends, and anyone who need statistical analysis. Statmanager-kr aims to be a user-friendly statistical package that can be easily used by people who unfamiliar with programming language. -Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions) to let me know if you have any questions, bugs you encounter, suggestions, etc. Of course, you can also email the developer directly. -궁금하신 점, 발생하는 버그, 제안 사항 등 모든 것은 [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions)을 활용해서 알려주시면 감사하겠습니다. 물론, 개발자에게 직접 이메일을 보내셔도 됩니다. +Currently, KOREAN and ENGLISH are supported. -# -* [Quick Start with sample jupyter notebook file](https://github.com/ckdckd145/statmanager-kr/blob/main/test.ipynb) -* Available functions | 현재 사용 가능한 분석 - * [Read detailed instructions](https://www.notion.so/cslee145/Documentation-74a610c12881402d96dc5d1654f97433?pvs=4#be93db7f4159419fa73eb324d6567793) | [상세 사용법 열람](https://www.notion.so/cslee145/dded43262f784c70a37fddb11ec7c9d1?pvs=4#ef9a4aacd8b34b96bd7a4abdea4f5170)**** - 1. Normality assumption | 정규성 가정 - 2. Homoskedasticity assumption | 등분산성 가정 - 3. Reliability | 신뢰도 **확인** - 4. Frequency analysis | 빈도분석 - 5. Correlation analysis | 상관분석 - 6. Comparison (2) | 차이비교 (2) - 7. Comparison (3) | 차이비교 (3) - 8. Regression - - -* Available functions to make figure or graph | 그래프 혹은 그림 제작에 활용되는 기능 - * P-P plot - * Q-Q plot - * Histogram - * Histogram (cumulative) - * Pointplot (within differences) - * Boxplot (between group difference) +## Documentaion + +[Official documentation - Korean](https://cslee145.notion.site/fd776d4f9a4f4c9db2cf1bbe60726971?v=3b2b237555fc4cd3a41a8da337d80c01) +[Official Documentation - English](https://cslee145.notion.site/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4) + +## Source Code & Dependency +Source codes are available in the [Github respository](https://github.com/ckdckd145/statmanager-kr) #### Dependency * pandas @@ -61,19 +47,212 @@ Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discu * seaborn * XlsxWriter -#### Recommendation -Using "Jupyter Notebook" is STRONGLY RECOMMENDED (Of course, statmanager-kr works just as well in a Python environment) -"주피터 노트북(Jupyter Notebook)" 사용을 강력하게 권고합니다. 물론, Python 환경에서도 statmanager-kr은 문제없이 작동합니다. +It is recommended to use the latest versions of these libraries and packages to avoid unexpected errors. + +## Contribution Guidelines + +Please check the [guidelines](https://www.notion.so/cslee145/60cbfcbc90614fe990e02ab8340630cc?v=4991650ae5ce4427a215d1043802f5c0&pvs=4#96a4e9547ae54a41928ff4114729f6c2) in official documentation. + +Please use [Github Discussion](https://github.com/ckdckd145/statmanager-kr/discussions) to let me know the questions, bugs, suggestions or anything. + + +# Quick Start + +[If you want to start with sample file, click this](https://github.com/ckdckd145/statmanager-kr/blob/main/test.ipynb) +[Read manual in documentation](https://www.notion.so/cslee145/Documentation-74a610c12881402d96dc5d1654f97433?pvs=4#be93db7f4159419fa73eb324d6567793) | + + +### Installation +```python +pip install statmanager-kr +``` + +### Update +```python +pip install statmanager-kr --upgrade +``` + +### Import + +```Python +import pandas as pd +from statmanager import Stat_Manager + +# use your data file instead of 'testdf.csv' +df = pd.read_csv('testdf.csv', index_col = 'id') +sm = Stat_Manager(df, language = 'eng') +``` + +### Independent Samples T-test + +```python +sm.progress(method = 'ttest_ind', vars = 'age', group_vars = 'sex').figure() +``` + +
+ Output (Click to See) + +| | female | male | +| --- | --- | --- | +| n | 15.00 | 15.00 | +| mean | 27.33 | 28.00 | +| median | 26.00 | 26.00 | +| sd | 4.88 | 6.94 | +| min | 21.00 | 20.00 | +| max | 39.00 | 39.00 | + +| dependent variable | t-value | degree of freedom | p-value | 95% CI | Cohen'd | +| --- | --- | --- | --- | --- | --- | +| height | -0.304 | 28 | 0.763 | [-5.153, 3.820] | -0.111 | + +![figure](./doc/output_ttest_ind.png) + +
+ +### Dependent Samples T-test + +```python +sm.progress(method = 'ttest_rel', vars = ['prescore', 'postscore']).figure() +``` + +
+ Output (Click to See) + +| | prescore | postscore | +| --- | --- | --- | +| n | … | … | +| mean | 5.13 | 4.23 | +| median | 5.50 | 4.00 | +| sd | 2.85 | 2.91 | +| min | … | … | +| max | … | … | + +| variables | t-value | degree of freedom | p-value | 95% CI | Cohen's d | +| --- | --- | --- | --- | --- | --- | +| ['prescore', 'postscore'] | 1.198 | 29 | 0.24 | [-0.636, 2.436] | 0.313 | + +![figure](./doc/output_ttest_rel.png) + +
+ +### Pearson's Correlation + +```python +sm.progress(method = 'pearsonr', vars = ['income', 'prescore', 'age']).figure() +``` + +
+ Output (Click to See) + +| | n | Pearson's r | p-value | 95%_confidence_interval | +| --- | --- | --- | --- | --- | +| income & prescore | 30 | -0.103 | 0.588 | [-0.447, 0.267] | +| income & age | 30 | -0.051 | 0.789 | [-0.404, 0.315] | +| prescore & age | 30 | -0.044 | 0.816 | [-0.398, 0.321] | + +| | income | prescore | age | +| --- | --- | --- | --- | +| income | 1.000 | -0.103 | -0.051 | +| prescore | -0.103 | 1.000 | -0.044 | +| age | -0.051 | -0.044 | 1.000 | + +![figure](./doc/output_pearsonr.png) + +
+ +### One-way ANOVA with Post-hoc test + +```python +sm.progress(method = 'f_oneway', vars = 'age', group_vars = 'condition', posthoc = True).figure() +``` + +
+ Output (Click to See) + +| | test_group | sham_group | control_group | +| --- | --- | --- | --- | +| n | 10 | 10 | 10 | +| mean | 28.5 | 28.3 | 26.2 | +| median | 27 | 29 | 25.5 | +| sd | 6.57 | 5.56 | 5.88 | +| min | … | … | … | +| max | … | … | … | + +| | sum_sq | df | F | p-value | partial eta squared | +| --- | --- | --- | --- | --- | --- | +| Intercept | 6864.4 | 1 | 189.469 | 0 | 0.872 | +| C(condition) | 32.467 | 2 | 0.448 | 0.644 | 0.004 | +| Residual | 978.2 | 27 | NaN | NaN | 0.124 | + +|Test Multiple Comparison ttest_ind FWER=0.05 method=bonf alphacSidak=0.02, alphacBonf=0. | | | | | | +| --- | --- | --- | --- | --- | --- | + +| group1 | group2 | stat | pval | pval_corr | reject | +| --- | --- | --- | --- | --- | --- | +| control_group | sham_group | -0.8204 | 0.4227 | 1 | FALSE | +| control_group | test_group | -0.8246 | 0.4204 | 1 | FALSE | +| sham_group | test_group | -0.0735 | 0.9422 | 1 | FALSE | + + +![figure](./doc/output_f_oneway.png) + +
+ +### One-way Repeated Measure ANOVA with Post-hoc test + +```python +sm.progress(method = 'f_oneway_rm', vars = ['prescore','postscore','fupscore'], posthoc = True).figure() +``` + +
+ Output (Click to See) + +| | prescore | postscore | fupscore | +| --- | --- | --- | --- | +| n | 30.00 | 30.00 | 30.00 | +| mean | 5.13 | 4.23 | 4.37 | +| median | 5.50 | 4.00 | 4.00 | +| sd | 2.85 | 2.91 | 2.62 | +| min | … | … | … | +| max | … | … | … | + +| | F Value | Num DF | Den DF | p-value | partial etq squared | +| --- | --- | --- | --- | --- | --- | +| variable | 1.079 | 2 | 58 | 0.347 | 0.02 | + + +|Test Multiple Comparison ttest_ind FWER=0.05 method=bonf alphacSidak=0.02, alphacBonf=0. | | | | | | +| --- | --- | --- | --- | --- | --- | + +| group1 | group2 | stat | pval | pval_corr | reject | +| --- | --- | --- | --- | --- | --- | +| fupscore | postscore | 0.1866 | 0.8526 | 1 | FALSE | +| fupscore | prescore | -1.0849 | 0.2824 | 0.8473 | FALSE | +| postscore | prescore | -1.2106 | 0.231 | 0.6929 | FALSE | + + +![figure](./doc/output_f_oneway_rm.png) + +
+ +
+ +# Related Software + +As mentioned earlier, `Statmanager-kr` was developed to provide a user-friendly way to perform statistical analysis methods to test hypotheses, even if the researcher is not familiar with programming languages such as Python. As such, a related software that provides similar user-friendly features is [`Pingouin`](https://pingouin-stats.org/build/html/index.html). + +The main difference is that `Statmanager-kr` was developed with the goal of being a package that can be used by researchers who lack programming knowledge or experience. To this end, rather than implementing independent methods for each analysis, `Statmanager-kr` is designed to allow users to enter code in the same way at any time to perform statistical analysis and obtain the results. Of course, [`Pingouin`](https://pingouin-stats.org/build/html/index.html) also has user-friendly characteristics, but it is a package that is better suited for users with more programming experience and knowledge than `Statmanager-kr`. Due to this difference in characteristics, `Statmanager-kr` does not support the ability to fine-tune analysis methods by adjusting parameters, whereas [`Pingouin`](https://pingouin-stats.org/build/html/index.html) is useful for adjusting parameters to obtain more careful and suitable results. + +In conclusion, `Statmanager-kr` is a good package for researchers who lack programming experience and knowledge and want to see results quickly. [`Pingouin`](https://pingouin-stats.org/build/html/index.html), on the other hand, is a more suitable package for researchers with more programming experience and knowledge, who need a fine-tuned approach to each analysis method. -#### Installing statmanager-kr - pip install statmanager-kr -#### Updating statmanager-kr - pip install statmanager-kr --upgrade +## How to cite? +For inserting the citations, please use this: +* Lee, C., (2024). Statmanager-kr: A User-friendly Statistical Package for Python in Pandas. Journal of Open Source Software, 9(102), 6642, https://doi.org/10.21105/joss.06642 -# Development: Changseok Lee +## Development: Changseok Lee