Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add query reader #7

Merged
merged 1 commit into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,4 @@ dmypy.json
.pyre/

# pdm
.pdm.toml
.pdm-python
2 changes: 0 additions & 2 deletions .pdm.toml

This file was deleted.

53 changes: 41 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,19 @@
<img alt="NebulaGraph NetworkX Adaptor(ng_nx)" src="https://user-images.githubusercontent.com/1651790/227207918-7c023215-b7cf-4aa5-b734-bc50411dab77.png">

<p align="center">
<em>Manipulation of graphs in NebulaGraph using the NetworkX API.</em>
<em>Manipulate and analyze NebulaGraph data using the NetworkX API</em>
</p>

<p align="center">
<a href="LICENSE" target="_blank">
<img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License">
</a>

<a href="https://badge.fury.io/py/ng_nx" target="_blank">
<img src="https://badge.fury.io/py/ng_nx.svg" alt="PyPI version">
</a>

<a href="https://pdm.fming.dev" target="_blank">
<img src="https://img.shields.io/badge/pdm-managed-blueviolet" alt="pdm-managed">
</a>

<!-- <a href="https://github.com/wey-gu/nebulagraph-nx/actions/workflows/ci.yml">
<img src="https://github.com/wey-gu/nebulagraph-nx/actions/workflows/ci.yml/badge.svg" alt="Tests">
</a> -->

</p>

---
Expand All @@ -31,13 +24,15 @@

---

NebulaGraph NetworkX (ng_nx) is a tool that allows you to use the NetworkX API for manipulating graphs in NebulaGraph. It makes it easy to analyze and manipulate graphs using NebulaGraph's advanced capabilities while still using the familiar NetworkX interface. In short, ng_nx bridges the gap between NebulaGraph and NetworkX.
NebulaGraph NetworkX (ng_nx) is a powerful tool that bridges NebulaGraph and NetworkX, enabling you to leverage NetworkX's rich set of graph algorithms and analysis tools on data stored in NebulaGraph. This integration combines NebulaGraph's advanced storage capabilities with NetworkX's extensive graph analysis functionality.

## Quick Start

Prepare for a NebulaGraph cluster within Colab in 5 mins following https://github.com/nebula-contrib/nebulagraph-lite.
### Prerequisites

Ensure you have a NebulaGraph cluster running. For a quick setup, you can use [NebulaGraph Lite](https://github.com/nebula-contrib/nebulagraph-lite) to set up a cluster in Colab within 5 minutes.

### Install
### Installation

```bash
pip install ng_nx
Expand Down Expand Up @@ -123,7 +118,41 @@ louvain_writer.set_options(
louvain_writer.write()
```

### Using NebulaQueryReader

The `NebulaQueryReader` allows you to execute any NebulaGraph query and construct a NetworkX graph from the result.

```python
from ng_nx import NebulaQueryReader
from ng_nx.utils import NebulaGraphConfig

config = NebulaGraphConfig(
space="demo_basketballplayer",
graphd_hosts="127.0.0.1:9669",
metad_hosts="127.0.0.1:9559"
)

reader = NebulaQueryReader(nebula_config=config)

# Execute a custom query
query = "MATCH p=(v:player{name:'Tim Duncan'})-[e:follow*1..3]->(v2) RETURN p"
g = reader.read(query)
```

This approach allows you to leverage the full power of NebulaGraph's query language while still being able to analyze the results using NetworkX.

## Readers

NG-NX provides three types of readers to fetch data from NebulaGraph:

1. `NebulaReader`: Reads a graph from NebulaGraph based on specified edges and properties, returning a NetworkX graph. It uses the MATCH clause internally to fetch data from NebulaGraph.

2. `NebulaQueryReader`: Executes a custom NebulaGraph query and constructs a NetworkX graph from the result. This reader is particularly useful when you need to perform complex queries or have specific data retrieval requirements.

3. `NebulaScanReader` (Coming soon): Will read graph data from NebulaGraph using a configuration similar to `NebulaReader`, but it will bypass the MATCH clause and utilize the SCAN interface with the Storage Client for potentially improved performance on large datasets.

Each reader is designed to cater to different use cases, providing flexibility in how you interact with and retrieve data from NebulaGraph for analysis with NetworkX.

## Documentation

[API Reference](https://github.com/wey-gu/nebulagraph-nx/blob/main/docs/API.md)
[API Reference](https://github.com/wey-gu/nebulagraph-nx/blob/main/docs/API.md)
5 changes: 3 additions & 2 deletions ng_nx/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2023 The NebulaGraph Authors. All rights reserved.
# Copyright 2024 The NebulaGraph Authors. All rights reserved.

from pkgutil import extend_path

__path__ = extend_path(__path__, __name__) # type: ignore

from ng_nx.query_reader import NebulaReader
from ng_nx.query_reader import NebulaReader, NebulaQueryReader
from ng_nx.scan_reader import NebulaScanReader
from ng_nx.writer import NebulaWriter

Expand All @@ -14,4 +14,5 @@
"NebulaReader",
"NebulaScanReader",
"NebulaWriter",
"NebulaQueryReader",
)
54 changes: 53 additions & 1 deletion ng_nx/query_reader.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2023 The NebulaGraph Authors. All rights reserved.
# Copyright 2024 The NebulaGraph Authors. All rights reserved.

import networkx as nx
import pandas as pd
from nebula3.Config import Config
from nebula3.gclient.net import ConnectionPool
from nebula3.data.ResultSet import ResultSet

from ng_nx.utils import NebulaGraphConfig, result_to_df

Expand Down Expand Up @@ -76,3 +77,54 @@ def release(self):

def __del__(self):
self.release()

class NebulaQueryReader:
def __init__(self, nebula_config: NebulaGraphConfig):
self.config = nebula_config
self.connection_pool = ConnectionPool()
graphd_hosts = nebula_config.graphd_hosts.split(",")
graphd_host_list = [
(host.split(":")[0], int(host.split(":")[1])) for host in graphd_hosts
]
config = Config()
assert self.connection_pool.init(
graphd_host_list, config
), "Init Connection Pool Failed"

def read(self, query: str) -> nx.MultiDiGraph:
with self.connection_pool.session_context(
self.config.user, self.config.password
) as session:
assert session.execute(
f"USE {self.config.space}"
).is_succeeded(), f"Failed to use space {self.config.space}"

result: ResultSet = session.execute(query)
assert result.is_succeeded(), f"Query execution failed: {result.error_msg()}"

vis_data = result.dict_for_vis()
return self._construct_graph(vis_data)

def _construct_graph(self, vis_data: dict) -> nx.MultiDiGraph:
g = nx.MultiDiGraph()

# Add nodes
for node_data in vis_data['nodes']:
g.add_node(node_data['id'], **node_data['props'], labels=node_data['labels'])

# Add edges
for edge_data in vis_data['edges']:
g.add_edge(
edge_data['src'],
edge_data['dst'],
key=edge_data['name'],
**edge_data['props']
)

return g

def release(self):
self.connection_pool.close()

def __del__(self):
self.release()
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -79,16 +79,16 @@ filter_files = true

[project]
name = "ng_nx"
version = "0.1.9"
description = "NebulaGraph NetowrkX adaptor"
version = "0.2.0"
description = "NebulaGraph NetowrkX Adaptor"
authors = [
{name = "Wey Gu", email = "[email protected]"},
]
# only numpy==1.21.6, scipy==1.7.3 could work on m1 mac, yet work on py37
# ng_ai need to work with pyspark 2.4.x, which only support py37
dependencies = [
"networkx>=2.5.1",
"nebula3-python>=3.4.0",
"nebula3-python>=3.8.2",
"pandas>=1.3.5",
"numpy>=1.21.6",
"scipy>=1.7.3",
Expand Down