Skip to content

Commit

Permalink
[Feat][Spark] Implementation of PySpark bindings to Scala API (#300)
Browse files Browse the repository at this point in the history
* Implementation draft & concept

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   .gitignore
	new file:   pyspark/graphar_pysaprk/__init__.py
	new file:   pyspark/graphar_pysaprk/enums.py
	new file:   pyspark/graphar_pysaprk/graph.py
	new file:   pyspark/graphar_pysaprk/info.py
	new file:   pyspark/graphar_pysaprk/reader.py
	new file:   pyspark/graphar_pysaprk/writer.py
	new file:   pyspark/poetry.lock
	new file:   pyspark/pyproject.toml

* Part of tests & update branch

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	renamed:    pyspark/graphar_pysaprk/__init__.py -> pyspark/graphar_pyspark/__init__.py
	renamed:    pyspark/graphar_pysaprk/enums.py -> pyspark/graphar_pyspark/enums.py
	renamed:    pyspark/graphar_pysaprk/graph.py -> pyspark/graphar_pyspark/graph.py
	renamed:    pyspark/graphar_pysaprk/info.py -> pyspark/graphar_pyspark/info.py
	renamed:    pyspark/graphar_pysaprk/reader.py -> pyspark/graphar_pyspark/reader.py
	renamed:    pyspark/graphar_pysaprk/writer.py -> pyspark/graphar_pyspark/writer.py
	modified:   pyspark/poetry.lock
	modified:   pyspark/pyproject.toml
	new file:   pyspark/tests/__init__.py
	new file:   pyspark/tests/conftest.py
	new file:   pyspark/tests/test_enums.py
	new file:   pyspark/tests/test_info.py

* Update VertexInfo.load_vertex_info & test & fixes

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   .gitignore
	modified:   pyspark/graphar_pyspark/__init__.py
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/poetry.lock
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/test_info.py

* Push changes before pulling from upstream

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	new file:   pyspark/README.rst
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/test_info.py

* Tests + fixes + updates from comments

- update pyproject.toml
- fix a lot of things
- some work based on comments
- license header everywhere
- minor changes

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	new file:   pyspark/Makefile
	modified:   pyspark/graphar_pyspark/graph.py
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/graphar_pyspark/reader.py
	modified:   pyspark/poetry.lock
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/__init__.py
	modified:   pyspark/tests/conftest.py
	modified:   pyspark/tests/test_enums.py
	modified:   pyspark/tests/test_info.py
	new file:   pyspark/tests/test_reader.py

 Changes not staged for commit:
	modified:   spark/pom.xml
	modified:   spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala
	modified:   spark/src/main/scala/com/alibaba/graphar/GraphInfo.scala
	modified:   spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala

* Fix init for GraphArSession

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/__init__.py

 Changes not staged for commit:
	modified:   spark/pom.xml
	modified:   spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala
	modified:   spark/src/main/scala/com/alibaba/graphar/GraphInfo.scala
	modified:   spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala

* Tests and fixes from comments

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/__init__.py
	modified:   pyspark/graphar_pyspark/enums.py
	modified:   pyspark/graphar_pyspark/graph.py
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/graphar_pyspark/reader.py
	new file:   pyspark/graphar_pyspark/util.py
	modified:   pyspark/graphar_pyspark/writer.py
	modified:   pyspark/tests/test_info.py
	modified:   pyspark/tests/test_reader.py
	new file:   pyspark/tests/test_writer.py

* Make PR ready for review

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/util.py
	modified:   pyspark/graphar_pyspark/writer.py
	modified:   pyspark/tests/test_writer.py

* Fixes from comments && docs

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   .gitignore
	modified:   docs/Makefile
	modified:   docs/index.rst
	new file:   docs/pyspark/api/graphar_pyspark.rst
	new file:   docs/pyspark/api/modules.rst
	new file:   docs/pyspark/index.rst
	new file:   docs/pyspark/pyspark-lib.rst
	modified:   pyspark/Makefile
	modified:   pyspark/graphar_pyspark/__init__.py
	modified:   pyspark/graphar_pyspark/enums.py
	new file:   pyspark/graphar_pyspark/errors.py
	modified:   pyspark/graphar_pyspark/graph.py
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/graphar_pyspark/reader.py
	modified:   pyspark/graphar_pyspark/util.py
	modified:   pyspark/graphar_pyspark/writer.py
	modified:   pyspark/poetry.lock
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/__init__.py
	modified:   pyspark/tests/conftest.py
	modified:   pyspark/tests/test_enums.py
	modified:   pyspark/tests/test_info.py
	modified:   pyspark/tests/test_reader.py
	modified:   pyspark/tests/test_writer.py

* Add license-header to pyspark/README

+ add poetry-lock file to .licenserc ignore section

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   .licenserc.yaml
	new file:   pyspark/README.md
	deleted:    pyspark/README.rst
	modified:   pyspark/pyproject.toml

* Update tests && small fixes

- new tests
- improved coverage
- updated Makefile for Python project
- updated pyproject.toml

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/Makefile
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/graphar_pyspark/writer.py
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/test_info.py
	modified:   pyspark/tests/test_writer.py

* Drop outdated comment and TODO

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/writer.py

* Fix broken commit

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/tests/test_writer.py

* Tests coverage 95% && docstrings && linting pass

- ruff passed
- coverage 95%+
- docstrings for all the public API

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/__init__.py
	modified:   pyspark/graphar_pyspark/enums.py
	modified:   pyspark/graphar_pyspark/errors.py
	modified:   pyspark/graphar_pyspark/graph.py
	modified:   pyspark/graphar_pyspark/info.py
	modified:   pyspark/graphar_pyspark/reader.py
	modified:   pyspark/graphar_pyspark/util.py
	modified:   pyspark/graphar_pyspark/writer.py
	modified:   pyspark/poetry.lock
	modified:   pyspark/pyproject.toml
	modified:   pyspark/tests/test_reader.py
	new file:   pyspark/tests/test_transform.py
	modified:   pyspark/tests/test_writer.py

* Ci & docs

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	new file:   .github/workflows/pyspark.yml
	new file:   docs/pyspark/how-to.rst
	modified:   docs/pyspark/index.rst

* Update branch && update README

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/README.md

* Fixes from comments

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   .github/workflows/pyspark.yml
	modified:   pyspark/README.md

* Fix linter errors

 On branch 297-add-pyspark-bindings
 Changes to be committed:
	modified:   pyspark/graphar_pyspark/info.py
  • Loading branch information
SemyonSinchenko authored Jan 10, 2024
1 parent 955b3a5 commit 2faccd8
Show file tree
Hide file tree
Showing 29 changed files with 6,326 additions and 1 deletion.
62 changes: 62 additions & 0 deletions .github/workflows/pyspark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright 2022-2023 Alibaba Group Holding Limited.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: GraphAr PySpark CI

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
pull_request:
branches:
- main

concurrency:
group: ${{ github.repository }}-${{ github.event.number || github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

jobs:
GraphAr-spark:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
submodules: true

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: 3.9

- name: Install Poetry
uses: abatilo/actions-poetry@v2

- name: Install Spark Scala && PySpark
run: |
cd pyspark
make install_test
- name: Run PyTest
run: |
cd pyspark
make test
- name: Lint
run: |
cd pyspark
make install_lint
make lint
63 changes: 63 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,66 @@
.ccls-cache

compile_commands.json

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
pyspark/assets

# Jupyter Notebook
.ipynb_checkpoints
*.ipynb


# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Ruff
.ruff_cache

### Scala ###
*.bloop
*.metals
3 changes: 2 additions & 1 deletion .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,12 @@ header:
- '*.md'
- '*.rst'
- '**/*.json'
- 'pyspark/poetry.lock' # This file is generated automatically by Poetry-tool; there is no way to add license header

comment: on-failure

# If you don't want to check dependencies' license compatibility, remove the following part
dependency:
files:
- spark/pom.xml # If this is a maven project.
- java/pom.xml # If this is a maven project.
- java/pom.xml # If this is a maven project.
22 changes: 22 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,25 @@ html: cpp-apidoc spark-apidoc
--quiet
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

.PHONY: pyspark-apidoc
pyspark-apidoc:
cd $(ROOTDIR)/pyspark && \
poetry run sphinx-apidoc -o $(ROOTDIR)/docs/pyspark/api graphar_pyspark/

.PHONY: html-poetry
html-poetry:
cd $(ROOTDIR)/pyspark && \
poetry run bash -c "cd $(ROOTDIR)/docs && $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html"
rm -fr $(BUILDDIR)/html/spark/reference
cp -fr $(ROOTDIR)/spark/target/site/scaladocs $(BUILDDIR)/html/spark/reference/
cd $(ROOTDIR)/java && \
mvn -P javadoc javadoc:aggregate \
-Dmaven.antrun.skip=true \
-DskipTests \
-Djavadoc.output.directory=$(ROOTDIR)/docs/$(BUILDDIR)/html/java/ \
-Djavadoc.output.destDir=reference \
--quiet
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
C++ <cpp/index>
Java <java/index>
Spark <spark/index>
PySpark <pyspark/index>

.. toctree::
:maxdepth: 2
Expand Down
69 changes: 69 additions & 0 deletions docs/pyspark/api/graphar_pyspark.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
graphar\_pyspark package
========================

Submodules
----------

graphar\_pyspark.enums module
-----------------------------

.. automodule:: graphar_pyspark.enums
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.errors module
------------------------------

.. automodule:: graphar_pyspark.errors
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.graph module
-----------------------------

.. automodule:: graphar_pyspark.graph
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.info module
----------------------------

.. automodule:: graphar_pyspark.info
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.reader module
------------------------------

.. automodule:: graphar_pyspark.reader
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.util module
----------------------------

.. automodule:: graphar_pyspark.util
:members:
:undoc-members:
:show-inheritance:

graphar\_pyspark.writer module
------------------------------

.. automodule:: graphar_pyspark.writer
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: graphar_pyspark
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/pyspark/api/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
graphar_pyspark
===============

.. toctree::
:maxdepth: 4

graphar_pyspark
Loading

0 comments on commit 2faccd8

Please sign in to comment.