Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Commit

Permalink
Merge branch 'master' into incep_encoder
Browse files Browse the repository at this point in the history
  • Loading branch information
felix committed Oct 11, 2019
2 parents 2705c28 + 72c6d8f commit 02c96b0
Show file tree
Hide file tree
Showing 18 changed files with 183 additions and 64 deletions.
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,46 @@

# Release Note (`v0.0.44`)
> Release time: 2019-10-11 15:27:37

🙇 We'd like to thank all contributors for this new release! In particular,
hanhxiao, felix, Jem, 🙇


### 🆕 New Features

- [[```2fb0f4f9```](https://github.com/gnes-ai/gnes/commit/2fb0f4f9b7a241af258e2313f732aac7cc6b1d8f)] __-__ __flow__: add dump to jpg (*hanhxiao*)
- [[```552fcdfe```](https://github.com/gnes-ai/gnes/commit/552fcdfe9ffe627d134221000f6f59c6196e14a9)] __-__ __indexer-cli__: add as_response switcher to indexer cli (*hanhxiao*)
- [[```c8cedd04```](https://github.com/gnes-ai/gnes/commit/c8cedd04461f17870f226a71ce0c2fd71d14ed16)] __-__ __service__: remove async dump for better stability (*hanhxiao*)
- [[```1739c7b6```](https://github.com/gnes-ai/gnes/commit/1739c7b6c248e5961d67311331f723af7d9aa479)] __-__ __flow__: add client to flow (*hanhxiao*)
- [[```43b9d014```](https://github.com/gnes-ai/gnes/commit/43b9d014fd4e31d34f540687338c3bc48b908f80)] __-__ __flow__: add context manager to flow (*hanhxiao*)
- [[```ae0d4056```](https://github.com/gnes-ai/gnes/commit/ae0d40561705980fb3e7a367bfa3d1e92b916de2)] __-__ __flow__: first version of gnes flow (*hanhxiao*)

### 🐞 Bug fixes

- [[```c23ea61f```](https://github.com/gnes-ai/gnes/commit/c23ea61f8319e1c40aa91324abd05ea34ba1d6c9)] __-__ __frontend__: fix frontend blocking behavior (*hanhxiao*)
- [[```c880c9b0```](https://github.com/gnes-ai/gnes/commit/c880c9b0bfe7173b61ae6489669fea202f7200d0)] __-__ __service__: make service handler thread-safe (*hanhxiao*)
- [[```a3da0582```](https://github.com/gnes-ai/gnes/commit/a3da05829c2756c42f45a41572a9a0f2217d9d6a)] __-__ __flow__: fix flow unit test (*hanhxiao*)
- [[```6d118404```](https://github.com/gnes-ai/gnes/commit/6d118404283db1e2a5169c39e719690c55490780)] __-__ __ffmpeg__: threads=1 (*felix*)
- [[```bca5b5b7```](https://github.com/gnes-ai/gnes/commit/bca5b5b7fdc99f0ca666a3b0bf4bc77ae732c19c)] __-__ __base__: fix env expansion in gnes_config (*hanhxiao*)
- [[```72f4a044```](https://github.com/gnes-ai/gnes/commit/72f4a044f5c5fd547d72b9b38f0ecac9a75427a8)] __-__ __indexer__: fix empty chunk and dump_interval (*hanhxiao*)
- [[```9b79cdf5```](https://github.com/gnes-ai/gnes/commit/9b79cdf52aeca1ef4999b6ef306a816b9ad9dfbe)] __-__ __memory-leak__: try to fix memory leak danger (*felix*)
- [[```16097f3f```](https://github.com/gnes-ai/gnes/commit/16097f3f7051e43b1f21526dd0bef7e37c316904)] __-__ __video-decoder__: fix name (*felix*)
- [[```199a71a6```](https://github.com/gnes-ai/gnes/commit/199a71a6fb1a053d970dd910ce1f39c82541ad27)] __-__ __frontend__: remove duplicate receive (*hanhxiao*)
- [[```73dae6bd```](https://github.com/gnes-ai/gnes/commit/73dae6bdea612d375a103627e725c5ade5865011)] __-__ __service__: minor fix on the dump_interval (*hanhxiao*)
- [[```6f401905```](https://github.com/gnes-ai/gnes/commit/6f401905943728cdf0b9f206775ef5ee8347e59a)] __-__ __client__: fix bugs for client (*Jem*)
- [[```c5af9308```](https://github.com/gnes-ai/gnes/commit/c5af9308dafe179357b6b39e34f5705d64cef8b8)] __-__ __parser__: use str instead of textio stream to prevent serializer err (*hanhxiao*)
- [[```6a368335```](https://github.com/gnes-ai/gnes/commit/6a368335c802d2d0f1c5916d395c66ac46993814)] __-__ __cli__: show more detailed version info in cli (*hanhxiao*)

### 📗 Documentation

- [[```d60a24a9```](https://github.com/gnes-ai/gnes/commit/d60a24a90f2c0e2246a3da00fc860d94b1bfe669)] __-__ fix docs format (*hanhxiao*)

### 🍹 Other Improvements

- [[```3cc4b041```](https://github.com/gnes-ai/gnes/commit/3cc4b04178ab4bdc78402a50508ef80708d1227f)] __-__ fix unit test (*felix*)
- [[```57519198```](https://github.com/gnes-ai/gnes/commit/575191987c610e8c31300f936df377ebad970e7f)] __-__ __changelog__: update change log to v0.0.43 (*hanhxiao*)

# Release Note (`v0.0.43`)
> Release time: 2019-09-30 17:37:58
Expand Down
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
from os import path
import sys
from os import path

sys.path.insert(0, os.path.abspath('..'))

# -- Project information -----------------------------------------------------
Expand Down Expand Up @@ -51,6 +52,7 @@
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx_autodoc_typehints',
'sphinx.ext.viewcode',
'sphinxcontrib.apidoc',
'sphinxarg.ext',
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
sphinx-argparse
sphinxcontrib-apidoc
sphinxcontrib-apidoc
sphinx-autodoc-typehints
2 changes: 1 addition & 1 deletion gnes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

# do not change this line manually
# this is managed by git tag and updated on every release
__version__ = '0.0.43'
__version__ = '0.0.44'

# do not change this line manually
# this is managed by shell/make-proto.sh and updated on every execution
Expand Down
7 changes: 5 additions & 2 deletions gnes/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ def dump_full_path(self):
def yaml_full_path(self):
"""
Get the file path of the yaml config
:return:
"""
return os.path.join(self.work_dir, '%s.yml' % self.name)
Expand Down Expand Up @@ -247,7 +248,8 @@ def train(self, *args, **kwargs):
def dump(self, filename: str = None) -> None:
"""
Serialize the object to a binary file
:param filename: file path of the serialized file, if not given then `self.dump_full_path` is used
:param filename: file path of the serialized file, if not given then :py:attr:`dump_full_path` is used
"""
f = filename or self.dump_full_path
if not f:
Expand All @@ -260,7 +262,8 @@ def dump(self, filename: str = None) -> None:
def dump_yaml(self, filename: str = None) -> None:
"""
Serialize the object to a yaml file
:param filename: file path of the yaml file, if not given then `self.dump_yaml_path` is used
:param filename: file path of the yaml file, if not given then :py:attr:`dump_yaml_path` is used
"""
f = filename or self.yaml_full_path
if not f:
Expand Down
3 changes: 3 additions & 0 deletions gnes/cli/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,9 @@ def set_indexer_parser(parser=None):
if not parser:
parser = set_base_parser()
_set_sortable_service_parser(parser)
parser.add_argument('--as_response', type=ActionNoYes, default=True,
help='convert the message type from request to response after indexing. '
'turn it off if you want to chain other services after this index service.')

return parser

Expand Down
7 changes: 4 additions & 3 deletions gnes/client/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import sys
import time
import zipfile
from typing import Generator
from typing import Iterator

from termcolor import colored

Expand Down Expand Up @@ -78,6 +78,7 @@ def query_callback(self, req: 'gnes_pb2.Request', resp: 'gnes_pb2.Response'):
"""
callback after get the query result
override this method to customize query behavior
:param resp: response
:param req: query
:return:
Expand All @@ -86,14 +87,14 @@ def query_callback(self, req: 'gnes_pb2.Request', resp: 'gnes_pb2.Response'):
print(resp)

@property
def bytes_generator(self) -> Generator[bytes, None, None]:
def bytes_generator(self) -> Iterator[bytes]:
if self._bytes_generator:
return self._bytes_generator
else:
raise ValueError('bytes_generator is empty or not set')

@bytes_generator.setter
def bytes_generator(self, bytes_gen: Generator[bytes, None, None]):
def bytes_generator(self, bytes_gen: Iterator[bytes]):
if self._bytes_generator:
self.logger.warning('bytes_generator is not empty, overrided')
self._bytes_generator = bytes_gen
Expand Down
84 changes: 67 additions & 17 deletions gnes/flow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from collections import OrderedDict, defaultdict
from contextlib import ExitStack
from functools import wraps
from typing import Union, Tuple, List, Optional, Generator
from typing import Union, Tuple, List, Optional, Iterator

from ..cli.parser import set_router_parser, set_indexer_parser, \
set_frontend_parser, set_preprocessor_parser, \
Expand Down Expand Up @@ -61,12 +61,14 @@ class Flow:
"""
GNES Flow: an intuitive way to build workflow for GNES.
You can use `.add()` then `.build()` to customize your own workflow.
You can use :py:meth:`.add()` then :py:meth:`.build()` to customize your own workflow.
For example:
.. highlight:: python
.. code-block:: python
from gnes.flow import Flow, Service as gfs
f = (Flow(check_version=False, route_table=True)
.add(gfs.Preprocessor, yaml_path='BasePreprocessor')
.add(gfs.Encoder, yaml_path='BaseEncoder')
Expand All @@ -76,9 +78,12 @@ class Flow:
flow.index()
...
You can also use the shortcuts, e.g. :py:meth:`add_encoder`, :py:meth:`add_preprocessor`.
It is recommend to use flow in the context manner as showed above.
Note the different default copy behaviors in `.add()` and `.build()`:
`.add()` always copy the flow by default, whereas `.build()` modify the flow in place.
Note the different default copy behaviors in :py:meth:`.add()` and :py:meth:`.build()`:
:py:meth:`.add()` always copy the flow by default, whereas :py:meth:`.build()` modify the flow in place.
You can change this behavior by giving an argument `copy_flow=False`.
"""
Expand Down Expand Up @@ -134,6 +139,12 @@ def from_yaml(orchestration: str) -> 'Flow':

@_build_level(BuildLevel.GRAPH)
def to_mermaid(self, left_right: bool = True):
"""
Output the mermaid graph for visualization
:param left_right: render the flow in left-to-right manner, otherwise top-down manner.
:return:
"""
mermaid_graph = OrderedDict()
for k in self._service_nodes.keys():
mermaid_graph[k] = []
Expand Down Expand Up @@ -161,22 +172,60 @@ def to_mermaid(self, left_right: bool = True):
['graph %s' % ('LR' if left_right else 'TD')] + [ss for s in mermaid_graph.values() for ss in
s] + style + class_def)

self.logger.info(
'copy-paste the output and visualize it with: https://mermaidjs.github.io/mermaid-live-editor/')
return mermaid_str

def train(self, bytes_gen: Generator[bytes, None, None] = None, **kwargs):
@_build_level(BuildLevel.GRAPH)
def to_jpg(self, path: str = 'flow.jpg', left_right: bool = True):
"""
Rendering the current flow as a jpg image, this will call :py:meth:`to_mermaid` and it needs internet connection
:param path: the file path of the image
:param left_right: render the flow in left-to-right manner, otherwise top-down manner.
:return:
"""
import base64
from urllib.request import Request, urlopen
mermaid_str = self.to_mermaid(left_right)
encoded_str = base64.b64encode(bytes(mermaid_str, 'utf-8')).decode('utf-8')
print('https://mermaidjs.github.io/mermaid-live-editor/#/view/%s' % encoded_str)
self.logger.info('saving jpg...')
req = Request('https://mermaid.ink/img/%s' % encoded_str, headers={'User-Agent': 'Mozilla/5.0'})
with open(path, 'wb') as fp:
fp.write(urlopen(req).read())
self.logger.info('done')

def train(self, bytes_gen: Iterator[bytes] = None, **kwargs):
"""Do training on the current flow
It will start a :py:class:`CLIClient` and call :py:func:`train`.
:param bytes_gen: An iterator of bytes. If not given, then you have to specify it in `kwargs`.
:param kwargs: accepts all keyword arguments of `gnes client` CLI
"""
self._call_client(bytes_gen, mode='train', **kwargs)

def index(self, bytes_gen: Generator[bytes, None, None] = None, **kwargs):
def index(self, bytes_gen: Iterator[bytes] = None, **kwargs):
"""Do indexing on the current flow
It will start a :py:class:`CLIClient` and call :py:func:`index`.
:param bytes_gen: An iterator of bytes. If not given, then you have to specify it in `kwargs`.
:param kwargs: accepts all keyword arguments of `gnes client` CLI
"""
self._call_client(bytes_gen, mode='index', **kwargs)

def query(self, bytes_gen: Generator[bytes, None, None] = None, **kwargs):
def query(self, bytes_gen: Iterator[bytes] = None, **kwargs):
"""Do indexing on the current flow
It will start a :py:class:`CLIClient` and call :py:func:`query`.
:param bytes_gen: An iterator of bytes. If not given, then you have to specify it in `kwargs`.
:param kwargs: accepts all keyword arguments of `gnes client` CLI
"""
self._call_client(bytes_gen, mode='query', **kwargs)

@_build_level(BuildLevel.RUNTIME)
def _call_client(self, bytes_gen: Generator[bytes, None, None] = None, **kwargs):

def _call_client(self, bytes_gen: Iterator[bytes] = None, **kwargs):
os.unsetenv('http_proxy')
os.unsetenv('https_proxy')
args, p_args = self._get_parsed_args(self, set_client_cli_parser, kwargs)
Expand All @@ -188,26 +237,26 @@ def _call_client(self, bytes_gen: Generator[bytes, None, None] = None, **kwargs)
c.start()

def add_frontend(self, *args, **kwargs) -> 'Flow':
"""Add a frontend to the current flow, a shortcut of add(Service.Frontend)
"""Add a frontend to the current flow, a shortcut of :py:meth:`add(Service.Frontend)`.
Usually you dont need to call this function explicitly, a flow object contains a frontend service by default.
This function is useful when you build a flow without the frontend and want to customize the frontend later.
"""
return self.add(Service.Frontend, *args, **kwargs)

def add_encoder(self, *args, **kwargs) -> 'Flow':
"""Add an encoder to the current flow, a shortcut of add(Service.Encoder)"""
"""Add an encoder to the current flow, a shortcut of :py:meth:`add(Service.Encoder)`"""
return self.add(Service.Encoder, *args, **kwargs)

def add_indexer(self, *args, **kwargs) -> 'Flow':
"""Add an indexer to the current flow, a shortcut of add(Service.Indexer)"""
"""Add an indexer to the current flow, a shortcut of :py:meth:`add(Service.Indexer)`"""
return self.add(Service.Indexer, *args, **kwargs)

def add_preprocessor(self, *args, **kwargs) -> 'Flow':
"""Add a router to the current flow, a shortcut of add(Service.Preprocessor)"""
"""Add a preprocessor to the current flow, a shortcut of :py:meth:`add(Service.Preprocessor)`"""
return self.add(Service.Preprocessor, *args, **kwargs)

def add_router(self, *args, **kwargs) -> 'Flow':
"""Add a preprocessor to the current flow, a shortcut of add(Service.Router)"""
"""Add a router to the current flow, a shortcut of :py:meth:`add(Service.Router)`"""
return self.add(Service.Router, *args, **kwargs)

def add(self, service: 'Service',
Expand Down Expand Up @@ -412,7 +461,8 @@ def _build_graph(self, copy_flow: bool) -> 'Flow':
def build(self, backend: Optional[str] = 'thread', copy_flow: bool = False, *args, **kwargs) -> 'Flow':
"""
Build the current flow and make it ready to use
:param backend: supported 'thread', 'process', 'swarm', 'k8s', 'shell'
:param backend: supported 'thread', 'process', 'swarm', 'k8s', 'shell', if None then only build graph only
:param copy_flow: return the copy of the current flow
:return: the current flow (by default)
"""
Expand Down
7 changes: 5 additions & 2 deletions gnes/indexer/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import defaultdict
from functools import wraps
from typing import List, Any, Union, Callable, Tuple
from collections import defaultdict

import numpy as np

Expand All @@ -30,7 +30,8 @@ def __init__(self,
is_big_score_similar: bool = False,
*args, **kwargs):
"""
Base indexer, a valid indexer must implement `add` and `query` methods
Base indexer, a valid indexer must implement :py:meth:`add` and :py:meth:`query` methods
:type score_fn: advanced score function
:type normalize_fn: normalizing score function
:type is_big_score_similar: when set to true, then larger score means more similar
Expand Down Expand Up @@ -74,6 +75,7 @@ def __init__(self, helper_indexer: 'BaseChunkIndexerHelper' = None, *args, **kwa
def add(self, keys: List[Tuple[int, int]], vectors: np.ndarray, weights: List[float], *args, **kwargs):
"""
adding new chunks and their vector representations
:param keys: list of (doc_id, offset) tuple
:param vectors: vector representations
:param weights: weight of the chunks
Expand Down Expand Up @@ -159,6 +161,7 @@ class BaseDocIndexer(BaseIndexer):
def add(self, keys: List[int], docs: List['gnes_pb2.Document'], *args, **kwargs):
"""
adding new docs and their protobuf representation
:param keys: list of doc_id
:param docs: list of protobuf Document objects
"""
Expand Down
1 change: 1 addition & 0 deletions gnes/indexer/chunk/annoy.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ class AnnoyIndexer(BCI):
def __init__(self, num_dim: int, data_path: str, metric: str = 'angular', n_trees: int = 10, *args, **kwargs):
"""
Initialize an AnnoyIndexer
:param num_dim: when set to -1, then num_dim is auto decided on first .add()
:param data_path: index data file managed by the annoy indexer
:param metric:
Expand Down
1 change: 1 addition & 0 deletions gnes/indexer/chunk/faiss.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class FaissIndexer(BCI):
def __init__(self, num_dim: int, index_key: str, data_path: str, *args, **kwargs):
"""
Initialize an FaissIndexer
:param num_dim: when set to -1, then num_dim is auto decided on first .add()
:param data_path: index data file managed by the faiss indexer
"""
Expand Down
3 changes: 3 additions & 0 deletions gnes/indexer/doc/filesys.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def add(self, keys: List[int], docs: List['gnes_pb2.Document'], *args, **kwargs)
"""
write GIFs of each document into disk
folder structure: /data_path/doc_id/0.gif, 1.gif...
:param keys: list of doc id
:param docs: list of docs
"""
Expand All @@ -55,6 +56,8 @@ def add(self, keys: List[int], docs: List['gnes_pb2.Document'], *args, **kwargs)

def query(self, keys: List[int], *args, **kwargs) -> List['gnes_pb2.Document']:
"""
Find the doc according to the keys
:param keys: list of doc id
:return: list of documents whose chunks field contain all the GIFs of this doc(one GIF per chunk)
"""
Expand Down
1 change: 1 addition & 0 deletions gnes/router/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ def reduce_embedding(self, accum_msgs: List['gnes_pb2.Message'], msg_type: str,
def apply(self, msg: 'gnes_pb2.Message', accum_msgs: List['gnes_pb2.Message'], *args, **kwargs) -> None:
"""
reduce embeddings from encoders (means, concat ....)
:param msg: the current message
:param accum_msgs: accumulated messages
"""
Expand Down
Loading

0 comments on commit 02c96b0

Please sign in to comment.