From e3d3dcd7ebf9f04a87959686936cb64f86881fcb Mon Sep 17 00:00:00 2001 From: Julian Minder Date: Thu, 14 Jul 2022 12:40:11 +0200 Subject: [PATCH] Updated the docs to the new converter api --- docs/source/api/api.rst | 3 ++- docs/source/api/utils.rst | 8 ++++++++ docs/source/converter.rst | 21 +++++++++++++++------ docs/source/introduction.rst | 2 +- docs/source/quick_start.rst | 3 ++- 5 files changed, 28 insertions(+), 9 deletions(-) create mode 100644 docs/source/api/utils.rst diff --git a/docs/source/api/api.rst b/docs/source/api/api.rst index c5a3ade..cf6779d 100644 --- a/docs/source/api/api.rst +++ b/docs/source/api/api.rst @@ -7,4 +7,5 @@ API Reference Core Relational modules - Py2neo extensions \ No newline at end of file + Py2neo extensions + Utils \ No newline at end of file diff --git a/docs/source/api/utils.rst b/docs/source/api/utils.rst new file mode 100644 index 0000000..ec4a3c4 --- /dev/null +++ b/docs/source/api/utils.rst @@ -0,0 +1,8 @@ +----------------- +Utils +----------------- + +General utility functions for working with rel2graph. + + +.. autofunction:: rel2graph.utils.load_file \ No newline at end of file diff --git a/docs/source/converter.rst b/docs/source/converter.rst index 9e2d876..03bd653 100644 --- a/docs/source/converter.rst +++ b/docs/source/converter.rst @@ -2,13 +2,13 @@ Converter ========= The |Converter| handles the main conversion of the relational data. -It is initialised with the *conversion schema filename*, the iterator and the graph. +It is initialised with the *conversion schema* as a string, the iterator and the graph. .. code-block:: python from rel2graph import Converter - converter = Converter(config_filename, iterator, graph) + converter = Converter(conversion_schema, iterator, graph) To start the conversion, one simply calls the object. It then iterates twice over the iterator: first to process all the nodes and, secondly, to create all relations. This makes sure that any node a relation refers to is already created first. @@ -16,12 +16,21 @@ To start the conversion, one simply calls the object. It then iterates twice ove converter() +If your conversion schema is saved in a seperate file you can use the provided ``load_file`` function to load it into a string. + +.. code-block:: python + + from rel2graph import Converter + from rel2graph.utils import load_file + + converter = Converter(load_file(conversion_schema_file), iterator, graph) + The |Converter| can utilise **multithreading**. When initialising you can set the number of parallel workers. Each worker operates in its own thread. Be aware that the committing to the graph is often still serialized, since the semantics require this (e.g. nodes must be committed before any relation or when [merging nodes](#merging-nodes) all the nodes must be serially committed). So the primary use-case of using multiple workers is if your resources are utilising a network connection (e.g. remote database) or if you require a lot of [matching](#match) in the graph (matching is parallelised). .. code-block:: python - converter = Converter(config_filename, iterator, graph, num_workers = 20) + converter = Converter(conversion_schema, iterator, graph, num_workers = 20) **Attention:** If you enable more than 1 workers, ensure that all your :doc:`wrappers ` support multithreading (add locks if necessary). @@ -40,7 +49,7 @@ In the first cell, you initially have created the converter object and called it .. code-block:: python - converter = Converter(config_filename, iterator, graph) + converter = Converter(conversion_schema, iterator, graph) converter() Now a ``ConnectionException`` is raised due to network problems. You can now fix the problem and then recall the converter in a new cell: @@ -58,14 +67,14 @@ You have a small error in your :doc:`conversion schema ` for .. code-block:: python - converter = Converter(config_filename, iterator, graph) + converter = Converter(conversion_schema, iterator, graph) converter() Now, e.g. ``KeyError`` is raised since the attribute name was written slightly wrong. Instead of rerunning the whole conversion (which might take hours), you can fix the schema file and reload the schema file and recall the converter: .. code-block:: python - converter.reload_config(config_filename) + converter.reload_schema(conversion_schema) converter() The converter will just continue where it left off with the new :doc:`conversion schema `. diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst index ef718f6..94aac7f 100644 --- a/docs/source/introduction.rst +++ b/docs/source/introduction.rst @@ -16,7 +16,7 @@ we can keep supplying it with resources without writing more code. :alt: rel2graph factory Since there might be different types of resources, we build a factory per resource type. -One specifies all the "blueprints" for all the factories in thr |convschema| file. +One specifies all the "blueprints" for all the factories in the |convschema| file. A |Converter|, the main object of *rel2graph*, will take this file and construct all the factories based on your "blueprints". For a set of supplied resources the |Converter| will automatically select the correct factory, use it to produce a graph out of the resource and merge the produced graph with diff --git a/docs/source/quick_start.rst b/docs/source/quick_start.rst index cce3913..4fa5bae 100644 --- a/docs/source/quick_start.rst +++ b/docs/source/quick_start.rst @@ -41,6 +41,7 @@ For an entity with ``Flower["petal_width"] = 5``, the outputed node will have th import pandas as pd from rel2graph.relational_modules.pandas import PandasDataframeIterator from rel2graph import IteratorIterator, Converter, Attribute, register_attribute_postprocessor + from rel2graph.utils import load_file # Create a connection to the neo4j graph with the py2neo Graph object graph = Graph(scheme="http", host="localhost", port=7474, auth=('neo4j', 'password')) @@ -60,7 +61,7 @@ For an entity with ``Flower["petal_width"] = 5``, the outputed node will have th iterator = IteratorIterator([pandas_iterator, iris_iterator]) # Create converter instance with schema, the final iterator and the graph - converter = Converter("schema.yaml", iterator, graph) + converter = Converter(load_file("schema.yaml"), iterator, graph) # Start the conversion converter()