Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3 with cached cross chunk edges #454

Open
wants to merge 105 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
d8b3c70
fix: cleanup ingest code
akhileshh Aug 6, 2023
76667a3
add ttl column family
akhileshh Aug 6, 2023
084d642
fix: new l2 cx edge attribute
akhileshh Aug 6, 2023
428365e
feat: post process sv cross edges
akhileshh Aug 6, 2023
a439b29
fix: use longer expiry for debugging
akhileshh Aug 11, 2023
87edec2
feat(ingest): read l2 cross edges
akhileshh Aug 11, 2023
5402c35
feat(ingest): postprocess job handling
akhileshh Aug 12, 2023
42e2b58
fix(ingest): status
akhileshh Aug 12, 2023
49286cd
fix: timedelta import
akhileshh Aug 12, 2023
2f060cd
fix(ingest): status
akhileshh Aug 12, 2023
c234e79
fix(ingest): use hypenated names for valid dns
akhileshh Aug 12, 2023
1920dfd
fix: rename attr; better var names
akhileshh Aug 20, 2023
515147c
fix: rename attr; better var names
akhileshh Aug 20, 2023
43e1e06
fix: add more docs; better var names
akhileshh Aug 20, 2023
decb4a9
fix: move cross_edges module to ingest module; only used in ingest
akhileshh Aug 20, 2023
2684453
fix: reduce mem use; var names; remove unused code
akhileshh Aug 20, 2023
cf75901
fix: adds cg typehint
akhileshh Aug 20, 2023
3b52527
fix: reduce loc
akhileshh Aug 20, 2023
7a95a5b
fix: use shorter name
akhileshh Aug 20, 2023
586f4e0
feat: cache cx edges at each layer
akhileshh Aug 20, 2023
1486cac
fix: convert array type
akhileshh Aug 20, 2023
1695514
fix: use atomic edges during ingest
akhileshh Aug 20, 2023
29283d1
fix: tests
akhileshh Aug 20, 2023
f120409
fix: remove postprocess step
akhileshh Aug 20, 2023
0898b32
fix: raises specific error
akhileshh Aug 20, 2023
262378f
fix: removes dangerous default value
akhileshh Aug 21, 2023
aa82d22
wip: read from cached edges
akhileshh Aug 21, 2023
172f897
wip: edits refactor
akhileshh Aug 21, 2023
156f2cd
wip: edits refactor
akhileshh Aug 21, 2023
88ffbf2
fix(ingest): cache cross chunk edges from children
akhileshh Aug 22, 2023
c5ddd1b
feat: add unique flag
akhileshh Aug 22, 2023
397a438
feat: cross edges column family gcversionrule
akhileshh Aug 22, 2023
0ab0759
fix: convert input to np arrays
akhileshh Aug 22, 2023
0fcf524
fix: linting issues
akhileshh Aug 22, 2023
c5e18d0
wip: edits refactor
akhileshh Aug 22, 2023
50bb03b
fix: undo gcrule changes
akhileshh Aug 23, 2023
5967596
fix: add mock_edges; linting issues
akhileshh Aug 23, 2023
005b027
feat: edits using cached cross edges
akhileshh Aug 23, 2023
db3911d
fix: use function for dry code
akhileshh Aug 24, 2023
7522de4
fix: mask skipped nodes
akhileshh Aug 28, 2023
f7a6031
fix: use the correct layer variable
akhileshh Aug 28, 2023
815c22b
fix: redis pipeline for lower latency
akhileshh Aug 29, 2023
b3ea907
fix: pass redis connection
akhileshh Aug 29, 2023
c9281c8
fix: version update for deployment
akhileshh Aug 29, 2023
43971e2
fix: status print padding
akhileshh Aug 29, 2023
94cb711
fix: filter active edges for split, add timestamp for reading cross c…
akhileshh Aug 30, 2023
51b592c
fix: get roots no cache flag
akhileshh Aug 30, 2023
70abf72
fix: parent and roots no cache
akhileshh Aug 31, 2023
a2c027c
fix: out edges here dont refer to edges crossing chunk
akhileshh Aug 31, 2023
2cab759
fix: missing timestamps
akhileshh Sep 2, 2023
206e282
fix: consolidate neighbor nodes cx edge updates
akhileshh Sep 8, 2023
737e3ef
fix: set to list for np.array
akhileshh Sep 8, 2023
39d16ba
fix: use copy=False where possible; some cleanup
akhileshh Sep 8, 2023
97eee3e
fix: attribute type must be np.array
akhileshh Sep 8, 2023
9138967
fix(ingest): worker details in status
akhileshh Sep 9, 2023
37b497d
fix: handle empty input
akhileshh Sep 9, 2023
1fc55e4
fix: use empty array instead
akhileshh Sep 9, 2023
9a4d2b2
fix: missed time_stamp
akhileshh Sep 10, 2023
68581f0
fix: only consolidate cx_edge writes; update per new_id
akhileshh Sep 10, 2023
0f95e0d
fix: reset parent layer in loop
akhileshh Sep 11, 2023
8e14031
fix(ingest): use get_roots with ceil=False instead of get_parents
akhileshh Sep 11, 2023
96a8a20
fix(ingest): incorrect stop_layer
akhileshh Sep 11, 2023
c8498bc
fix: add safeguard to against data corruption
akhileshh Sep 12, 2023
2963ff3
add another safeguard
akhileshh Sep 12, 2023
607e34d
feat: log operation_id in errors
akhileshh Sep 12, 2023
e79b689
fix: remove temp error
akhileshh Sep 12, 2023
2ba6827
add more safeguards
akhileshh Sep 12, 2023
3c23f7e
fix: circular import
akhileshh Sep 12, 2023
0d9d090
fix: consider layer 2 as well
akhileshh Sep 12, 2023
399e090
fix(edits): incorrect order of opeartions; documentation
akhileshh Sep 13, 2023
6c707c0
feat(ingest): add tests command
akhileshh Sep 15, 2023
9d9887b
fix(edits): make sure to add reverse edges
akhileshh Sep 26, 2023
4a0cba8
fix(edits): read neighbor cx edges from cache
akhileshh Sep 26, 2023
3b5c2bc
fix(edits): check for no cx edges; comments
akhileshh Sep 27, 2023
2d3441b
fix(edits): update neighbor cx edges in a skipped layer
akhileshh Oct 3, 2023
3131a0d
fix(edits): make sure to update all skipped neighbors
akhileshh Oct 11, 2023
fbc7874
fix(edits): ignore new ids in neighbor update
akhileshh Oct 11, 2023
7e229ab
add docs
akhileshh Oct 12, 2023
e04a7eb
fix: resolve column filter ambiguity
akhileshh Jan 14, 2024
4e1ce08
fix: resolve column filter ambiguity(2)
akhileshh Jan 14, 2024
475cd42
V3 migration (#484)
akhileshh May 12, 2024
9e49fd7
reset version v3
akhileshh May 12, 2024
b42a59c
breakup long fn
akhileshh May 12, 2024
fb0e5d3
gh actions for pcgv3
akhileshh May 15, 2024
19a1a67
update split tests (#497)
akhileshh May 25, 2024
b171f2e
segregate update nodes logic
akhileshh Jun 10, 2024
a72d0ff
fix(edits): overwrite children partners when superseded by parents
akhileshh Jun 28, 2024
ea65ca6
fix: unique edges always, predecing edit ts, allow same segment merge
akhileshh Jul 4, 2024
7432d8a
Bump version: 3.0.0 → 3.0.1
akhileshh Jul 4, 2024
bd4dd27
fix(edits): mask all descendants when updating cx edges
akhileshh Jul 6, 2024
02c727d
Bump version: 3.0.1 → 3.0.2
akhileshh Jul 6, 2024
9b0694e
fix(edits): use supervoxels to get the correct cross edge parents
akhileshh Jul 7, 2024
257ad9e
Bump version: 3.0.2 → 3.0.3
akhileshh Jul 7, 2024
d1dbdae
fix(edits/split): filter out inactive cross edges
akhileshh Jul 16, 2024
c6002b0
fix(edits/split): filter out inactive cross edges AT EACH LAYER
akhileshh Jul 17, 2024
1609624
migration debug code
akhileshh Aug 30, 2024
8268398
use parent timestamps to lift cx edges
akhileshh Sep 22, 2024
c93efe9
make dynamic mesh dir graph specific
akhileshh Sep 23, 2024
d5fa9fe
fix(upgrade): use hierarchy from supervoxels
akhileshh Sep 26, 2024
1037341
fix(upgrade): include cx edges at node_ts explicitly
akhileshh Sep 26, 2024
f1100ad
adds job type guard, flush_redis prompts, improved status output
akhileshh Sep 29, 2024
e62390a
fix(upgrade): include timestamps for partner supervoxel parents
akhileshh Nov 10, 2024
b8bcc3c
fix(upgrade): use timestamps of partners at layers > 2
akhileshh Nov 21, 2024
53b8e41
version 3.0.9
akhileshh Dec 5, 2024
47f2d2f
feat: use mesh dir and dynamic dir from metadata
akhileshh Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.18.3
current_version = 3.0.10
commit = True
tag = True

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ on:
push:
branches:
- "main"
- "pcgv3"
pull_request:
branches:
- "main"
- "pcgv3"

jobs:
unit-tests:
Expand Down
2 changes: 1 addition & 1 deletion pychunkedgraph/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.18.3"
__version__ = "3.0.10"
2 changes: 2 additions & 0 deletions pychunkedgraph/app/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ def configure_app(app):
with app.app_context():
from ..ingest.rq_cli import init_rq_cmds
from ..ingest.cli import init_ingest_cmds
from ..ingest.cli_upgrade import init_upgrade_cmds

init_rq_cmds(app)
init_ingest_cmds(app)
init_upgrade_cmds(app)
60 changes: 0 additions & 60 deletions pychunkedgraph/debug/cross_edge_test.py

This file was deleted.

78 changes: 0 additions & 78 deletions pychunkedgraph/debug/existence_test.py

This file was deleted.

54 changes: 0 additions & 54 deletions pychunkedgraph/debug/family_test.py

This file was deleted.

52 changes: 40 additions & 12 deletions pychunkedgraph/debug/utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# pylint: disable=invalid-name, missing-docstring, bare-except, unidiomatic-typecheck

import numpy as np

from ..graph import ChunkedGraph
from ..graph.utils.basetypes import NODE_ID
from pychunkedgraph.graph.meta import ChunkedGraphMeta, GraphConfig


def print_attrs(d):
Expand All @@ -16,28 +17,55 @@ def print_attrs(d):
print(v)


def print_node(
cg: ChunkedGraph,
node: NODE_ID,
indent: int = 0,
stop_layer: int = 2,
) -> None:
def print_node(cg, node: np.uint64, indent: int = 0, stop_layer: int = 2) -> None:
children = cg.get_children(node)
print(f"{' ' * indent}{node}[{len(children)}]")
if cg.get_chunk_layer(node) <= stop_layer:
return
for child in children:
print_node(cg, child, indent=indent + 1, stop_layer=stop_layer)
print_node(cg, child, indent=indent + 4, stop_layer=stop_layer)


def get_l2children(cg: ChunkedGraph, node: NODE_ID) -> np.ndarray:
nodes = np.array([node], dtype=NODE_ID)
def get_l2children(cg, node: np.uint64) -> np.ndarray:
nodes = np.array([node], dtype=np.uint64)
layers = cg.get_chunk_layers(nodes)
assert np.all(layers > 2), "nodes must be at layers > 2"
assert np.all(layers >= 2), "nodes must be at layers >= 2"
l2children = []
while nodes.size:
children = cg.get_children(nodes, flatten=True)
layers = cg.get_chunk_layers(children)
l2children.append(children[layers == 2])
nodes = children[layers > 2]
return np.concatenate(l2children)


def sanity_check(cg, new_roots, operation_id):
"""
Check for duplicates in hierarchy, useful for debugging.
"""
# print(f"{len(new_roots)} new ids from {operation_id}")
l2c_d = {}
for new_root in new_roots:
l2c_d[new_root] = get_l2children(cg, new_root)
success = True
for k, v in l2c_d.items():
success = success and (len(v) == np.unique(v).size)
# print(f"{k}: {np.unique(v).size}, {len(v)}")
if not success:
raise RuntimeError("Some ids are not valid.")


def sanity_check_single(cg, node, operation_id):
v = get_l2children(cg, node)
msg = f"invalid node {node}:"
msg += f" found {len(v)} l2 ids, must be {np.unique(v).size}"
assert np.unique(v).size == len(v), f"{msg}, from {operation_id}."
return v


def update_graph_id(cg, new_graph_id:str):
old_gc = cg.meta.graph_config._asdict()
old_gc["ID"] = new_graph_id
new_gc = GraphConfig(**old_gc)
new_meta = ChunkedGraphMeta(new_gc, cg.meta.data_source, cg.meta.custom_data)
cg.update_meta(new_meta, overwrite=True)
23 changes: 19 additions & 4 deletions pychunkedgraph/graph/attributes.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# pylint: disable=invalid-name, missing-docstring, protected-access, raise-missing-from

# TODO design to use these attributes across different clients
# `family_id` is specific to bigtable

from enum import Enum
from typing import NamedTuple

from .utils import serializers
Expand Down Expand Up @@ -101,20 +104,34 @@ class Connectivity:
serializer=serializers.NumPyArray(dtype=basetypes.EDGE_AREA),
)

CrossChunkEdge = _AttributeArray(
AtomicCrossChunkEdge = _AttributeArray(
pattern=b"atomic_cross_edges_%d",
family_id="3",
serializer=serializers.NumPyArray(
dtype=basetypes.NODE_ID, shape=(-1, 2), compression_level=22
),
)

FakeEdges = _Attribute(
CrossChunkEdge = _AttributeArray(
pattern=b"cross_edges_%d",
family_id="4",
serializer=serializers.NumPyArray(
dtype=basetypes.NODE_ID, shape=(-1, 2), compression_level=22
),
)

FakeEdgesCF3 = _Attribute(
key=b"fake_edges",
family_id="3",
serializer=serializers.NumPyArray(dtype=basetypes.NODE_ID, shape=(-1, 2)),
)

FakeEdges = _Attribute(
key=b"fake_edges",
family_id="4",
serializer=serializers.NumPyArray(dtype=basetypes.NODE_ID, shape=(-1, 2)),
)


class Hierarchy:
Child = _Attribute(
Expand Down Expand Up @@ -157,8 +174,6 @@ class GraphVersion:
class OperationLogs:
key = b"ioperations"

from enum import Enum

class StatusCodes(Enum):
SUCCESS = 0 # all is well, new changes persisted
CREATED = 1 # log record created in storage
Expand Down
Loading
Loading