Skip to content

Commit

Permalink
Updated the docs to the new converter api
Browse files Browse the repository at this point in the history
  • Loading branch information
jkminder committed Jul 14, 2022
1 parent fae6a04 commit e3d3dcd
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 9 deletions.
3 changes: 2 additions & 1 deletion docs/source/api/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ API Reference

Core <core>
Relational modules <relational_modules>
Py2neo extensions <py2neo_extensions>
Py2neo extensions <py2neo_extensions>
Utils <utils>
8 changes: 8 additions & 0 deletions docs/source/api/utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-----------------
Utils
-----------------

General utility functions for working with rel2graph.


.. autofunction:: rel2graph.utils.load_file
21 changes: 15 additions & 6 deletions docs/source/converter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,35 @@ Converter
=========

The |Converter| handles the main conversion of the relational data.
It is initialised with the *conversion schema filename*, the iterator and the graph.
It is initialised with the *conversion schema* as a string, the iterator and the graph.

.. code-block:: python
from rel2graph import Converter
converter = Converter(config_filename, iterator, graph)
converter = Converter(conversion_schema, iterator, graph)
To start the conversion, one simply calls the object. It then iterates twice over the iterator: first to process all the nodes and, secondly, to create all relations. This makes sure that any node a relation refers to is already created first.

.. code-block:: python
converter()
If your conversion schema is saved in a seperate file you can use the provided ``load_file`` function to load it into a string.

.. code-block:: python
from rel2graph import Converter
from rel2graph.utils import load_file
converter = Converter(load_file(conversion_schema_file), iterator, graph)
The |Converter| can utilise **multithreading**. When initialising you can set the number of parallel workers. Each worker operates in its own thread.
Be aware that the committing to the graph is often still serialized, since the semantics require this (e.g. nodes must be committed before any relation or when [merging nodes](#merging-nodes) all the nodes must be serially committed). So the primary use-case of using multiple workers is if your resources are utilising a network connection (e.g. remote database) or if you require a lot of [matching](#match) in the graph (matching is parallelised).

.. code-block:: python
converter = Converter(config_filename, iterator, graph, num_workers = 20)
converter = Converter(conversion_schema, iterator, graph, num_workers = 20)
**Attention:** If you enable more than 1 workers, ensure that all your :doc:`wrappers <wrapper>` support multithreading (add locks if necessary).

Expand All @@ -40,7 +49,7 @@ In the first cell, you initially have created the converter object and called it

.. code-block:: python
converter = Converter(config_filename, iterator, graph)
converter = Converter(conversion_schema, iterator, graph)
converter()
Now a ``ConnectionException`` is raised due to network problems. You can now fix the problem and then recall the converter in a new cell:
Expand All @@ -58,14 +67,14 @@ You have a small error in your :doc:`conversion schema <conversion_schema>` for

.. code-block:: python
converter = Converter(config_filename, iterator, graph)
converter = Converter(conversion_schema, iterator, graph)
converter()
Now, e.g. ``KeyError`` is raised since the attribute name was written slightly wrong. Instead of rerunning the whole conversion (which might take hours), you can fix the schema file and reload the schema file and recall the converter:

.. code-block:: python
converter.reload_config(config_filename)
converter.reload_schema(conversion_schema)
converter()
The converter will just continue where it left off with the new :doc:`conversion schema <conversion_schema>`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ we can keep supplying it with resources without writing more code.
:alt: rel2graph factory

Since there might be different types of resources, we build a factory per resource type.
One specifies all the "blueprints" for all the factories in thr |convschema| file.
One specifies all the "blueprints" for all the factories in the |convschema| file.
A |Converter|, the main object of *rel2graph*, will take this file and construct all the factories
based on your "blueprints". For a set of supplied resources the |Converter| will automatically select
the correct factory, use it to produce a graph out of the resource and merge the produced graph with
Expand Down
3 changes: 2 additions & 1 deletion docs/source/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ For an entity with ``Flower["petal_width"] = 5``, the outputed node will have th
import pandas as pd
from rel2graph.relational_modules.pandas import PandasDataframeIterator
from rel2graph import IteratorIterator, Converter, Attribute, register_attribute_postprocessor
from rel2graph.utils import load_file
# Create a connection to the neo4j graph with the py2neo Graph object
graph = Graph(scheme="http", host="localhost", port=7474, auth=('neo4j', 'password'))
Expand All @@ -60,7 +61,7 @@ For an entity with ``Flower["petal_width"] = 5``, the outputed node will have th
iterator = IteratorIterator([pandas_iterator, iris_iterator])
# Create converter instance with schema, the final iterator and the graph
converter = Converter("schema.yaml", iterator, graph)
converter = Converter(load_file("schema.yaml"), iterator, graph)
# Start the conversion
converter()
Expand Down

0 comments on commit e3d3dcd

Please sign in to comment.