Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial version of migration guide #121

Merged
merged 12 commits into from
Jul 11, 2019
9 changes: 3 additions & 6 deletions MIGRATION_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,15 +142,12 @@ with context as client.context():
strings (entity_pb2.Value.string_value). At read time, a `StringProperty`
will accept either a string or blob value, so compatibility is maintained
with legacy databases.
- Instances of google.appengine.datastore.datastore_query.Order have been
replaced by a simple list of field names for ordering.
- The QueryOptions class from google.cloud.ndb.query, has been reimplemented,
- The QueryOptions class from google.cloud.ndb.query, has been reimplemented,
since google.appengine.datastore.datastore_rpc.Configuration is no longer
available. It still uses the same signature, but does not support original
Configuration methods.
- Because google.appengine.datastore.datastore_query.Order is no longer
available, the `order` parameter for the query.Query constructor has been
replaced by a list or tuple.
- Because google.appengine.datastore.datastore_query.Order is no longer
available, the ndb.query.PropertyOrder class has been created to replace it.
- Transaction propagation is no longer supported. This was a feature of the
older Datastore RPC library which is no longer used. Starting a new
transaction when a transaction is already in progress in the current context
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
blobstore
metadata
stats
migrating

This is a Python 3 version of the `ndb` client library for use with
`Google Cloud Datastore <https://cloud.google.com/datastore>`_.
Expand Down
225 changes: 225 additions & 0 deletions docs/migrating.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
######################################
Migrating from Python 2 version of NDB
######################################

While every attempt has been made to keep compatibilty with the previous
version of `ndb`, there are fundamental differences at the platform level,
which have made necessary in some cases to depart from the original
implementation, and sometimes even to remove exisitng functionality
altogether.

Because one of the main objectives of this rewrite was to be able to use `ndb`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially assumed "independently from GAE" meant "in another Google Cloud Product". How about:
One of the main objectives of this rewrite was to enable ndb for use in any Python environment, not just Google App Engine. As a result, many of the ndb APIs that relied on GAE environment and runtime variables, resources, and legacy APIs have been dropped.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed "use ndb independently from Google App Engine" meant "use nbd in another Google Cloud product." How about changing the paragraph to this:
One of the main objectives of this rewrite was to enable ndb for use in any Python environment, not just Google App Engine. As a result, ndb APIs that depended on GAE environment and runtime variables, resources, or legacy APIs have been dropped.

independently from Google App Engine, the legacy APIs from GAE cannot be
depended upon. Also, any environment and runtime variables and resources will
not be available when running outside of GAE. This means that many `ndb` APIs
that depended on GAE have been changed, and many APIs that accessed GAE
resources directly have been dropped.

Aside from this, there are many differences between the Datastore APIs
provided by GAE and those provided by the newer Google Cloud Platform. These
diffeences have required some code and API changes as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: diffeences


Finally, in many cases, new features of Python 3 have eliminated the need for
some code, particularly from the old `utils` module.

If you are migrating code, these changes can generate some confusion. This
document will cover the most common migration issues.

Setting up a connection
=======================

The most important difference from the previous `ndb` version, is that the new
`ndb` requires the use of a client to set up a runtime context for a project.
This is necessary because `ndb` can now be used in any Python environment, so
we can no longer assume it's running in the context of a GAE request.

The `ndb` client uses ``google.auth`` for authentication, which is how APIs in
Google Cloud Platform work. The client can take a `credentials` parameter or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend "The ndb client uses google.auth for authentication, consistent with other Google Cloud Platform client libraries."

get the credentials using the `GOOGLE_APPLCATION_CREDENTIALS` environment
variable, which is the recommended option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend we also link to https://cloud.google.com/storage/docs/reference/libraries for authentication advice.


After instantiating a client, it's necessary to establish a runtime context,
andrewsg marked this conversation as resolved.
Show resolved Hide resolved
using the ``Client.context`` method. All interactions with the database must
be within the context obtained from this call::

from google.cloud import ndb

client = ndb.Client()

with context as client.context():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap context and client.context()

do_something_with_ndb()

Note that the example above is assumming the google credentials are set in
the environment.

Keys
====

There are some methods from the ``key`` module that are not implemented in
this version of `ndb`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to explain why?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Based on https://cloud.google.com/appengine/docs/standard/python/ndb/keyclass#class_methods, I think this could say:
These methods were used to pass keys to and from the db Datastore API, which is no longer supported.


- Key.from_old_key.
- Key.to_old_key.

Properties
==========

There are various small changes in some of the model properties that might
trip you up when migrating code. Here are some of them, for quick reference:

- The `BlobProperty` constructor only sets `_compressed` if explicitly
passed. The original set `_compressed` always.
- In the exact same fashion the `JsonProperty` constructor only sets
`_json_type` if explicitly passed.]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

] seems to be a typo.

- Similarly, the `DateTimeProperty` constructor only sets `_auto_now` and
`_auto_now_add` if explicitly passed.
- `TextProperty(indexed=True)` and `StringProperty(indexed=False)` are no
longer supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that TextProperty can no longer be indexed, whereas StringProperty is always indexed?
If so, for clarification can you add:
That is, TextProperty can no longer be indexed, whereas StringProperty is always indexed.

- The `Property()` constructor (and subclasses) originally accepted both
`unicode` and `str` (the Python 2 versions) for `name` (and `kind`) but now
only accept `str`.

QueryOptions and Query Order
============================

The QueryOptions class from ``google.cloud.ndb.query``, has been reimplemented,
since ``google.appengine.datastore.datastore_rpc.Configuration`` is no longer
available. It still uses the same signature, but does not support original
Configuration methods.

Similarly,b ecause ``google.appengine.datastore.datastore_query.Order`` is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/Similarly,b ecause/Similarly, because/

longer available, the ``ndb.query.PropertyOrder`` class has been created to
replace it.

MessageProperty and EnumProperty
================================

These properties, from the ``ndb.msgprop`` module, depend on the Google
Protocol RPC Library, or `protorpc`, which is not an `ndb` dependency. For
this reason, they are not part of this version of `ndb`.

Tasklets
========

When writing a `tasklet`, it is no longer necessary to raise a Return
exception for returning the result. A normal return can be used instead::

@ndb.tasklet
def get_cart():
cart = yield CartItem.query().fetch_async()
return cart

Note that "raise Return(cart)" can still be used, but it's not recommended.

There are some methods from the ``tasklet`` module that are not implemented in
this version of `ndb`, mainly because of changes in how an `ndb` context is
created and used in this version:

- add_flow_exception.
- make_context.
- make_default_context.
- QueueFuture.
- ReducedFuture.
- SerialQueueFuture.
- set_context.
- toplevel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still implement toplevel. Or did you look at it and find out we couldn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only took a quick glance while going through the TODO list, and I thought its reliance on _make_cloud_datastore_context meant it was out. I didn't stop to think about the implementation. Do you think I should remove it from this list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's doable. Let's keep this on our TODO list, still.


Utils
=====

The previous version of `ndb` included an ``ndb.utils`` module, which defined
a number of methods that were mostly used internally. Some of those have been
made obsolete by new Python 3 features, while others have been discarded due
to implementation differences in the new `ndb`.

Possibly the most used utility from this module outside of `ndb` code, is the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we go back and add py27 compatibility, at least.

``positional`` decorator, which declares that only the first `n` arguments of
a function or method may be positional. Python 3 can do this using keyword-only
arguments. What used to be written as::

@utils.positional(2)
def function1(arg1, arg2, arg3=None, arg4=None)
pass

Will be written like this in the new version::

def function1(arg1, arg2, *, arg3=None, arg4=None)
pass

Exceptions
==========

App Engine's legacy exceptions are no longer available, but `ndb` provides
shims for most of them, which can be imported from the `ndb.exceptions`
package, like this::

from ndb.exceptioms import BadRequestError, BadArgumentError

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: exceptioms


Datastore API
=============

There are many differences bewteen the current Datastore API and the legacy App
Engine Datastore. In most cases, where the public API was generally used, this
should not be a problem. However, if you relied in your code on the private
Datastore API, the code that does this will probably need to be rewritten.
Specifically, any function or method that dealt directly with protocol buffers
will no longer work. The Datastore `.protobuf` definitions have changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reworked or elided following the edit above.

significantly from the public API used by App Engine to the current published
API. Additionally, this version of NDB mostly delegates to
`google.cloud.datastore` for parsing data returned by RPCs, which is a
significant internal refactoring.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. Are you saying that the old NDB library included some undocumented APIs that dealt directly with protocol buffers? Also, from reading the MIGRATION_NOTES, I'm not sure it would be possible for someone to tell which APIs dealt with protobufs.

How about this:
The old NDB library included some undocumented APIs that dealt directly with Datastore protocol buffers. These APIs will no longer work. Rewrite any code that used the following classes, properties, or methods:

  • ModelAdapter
  • Property._db_get_value, Property._db_set_value
  • Property._db_set_compressed_meaning and Property._db_set_uncompressed_meaning
  • Model._deserialize and Model._serialize
  • model.make_connection


Default Namespace
=================

In the previous version, ``google.appengine.api.namespacemanager`` was used
to determine the default namespace when not passed in to constructors which
require it, like ``Key``. In this version, the client class can be instantiated
with a namespace, which will be used as the default whenever it's not included
in the constuctor or method arguments that expect a namespace::

from google.cloud import ndb

client=ndb.Client(namespace="my namespace")

with context as client.context():
key = ndb.Key("SomeKind", "SomeId")

In this example, the key will be created under the namespace `my namespace`,
because that's the namespace passed in when setting up the client.

Django Middleware
=================
Copy link
Contributor

@chrisrossi chrisrossi Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a couple things here:

  1. A context is thread local. Recommended practice is one context per request. So you probably want to create the client in __init__ but create the context in __call__.

  2. __call__ looks like a great place to use a with statement.

Maybe something more like:

    from google.cloud import ndb

    class NDBMiddleware(object):
        def __init__(self, get_response):
            self.get_response = get_response
            self.client = ndb.Client()

        def __call__(self, request):
            context = self.client.context()
            request.ndb_context = context
            with context:
                response = self.get_response(request)
            return response

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is any help, putting this in my Django middleware, I am receiving AttributeError: '_GeneratorContextManager' object has no attribute 'use'. Seems to be working without the use but I don't have enough "context" to know if that will break something else.

Putting the context in the __init__ definitely broke things for the second request that uses ndb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, I'm going to test this more and make it work before the merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot that Client.context calls Context.use for you. I will correct my snippet.


The Django middleware that was part of the GAE version of `ndb` has been
discontinued and is no longer available in current `ndb`. The middleware
basically took care of setting the context, which can be accomplished on
modern Django with a simple class middleware, similar to this::

from google.cloud import ndb

class NDBMiddleware(object):
def __init__(self, get_response):
self.get_response = get_response
client = ndb.Client()
self.ndb_context = client.context()

def __call__(self, request):
request.ndb_context = self.ndb_context
response = self.get_response(request)
return response

The ``__init__`` method is called only once, during server start, so it's a
good place to create and store an `ndb` context. The ``__call__`` method will
be called once for every request, so we add our ndb context to the request
there, before the response is processed. The context will then be available in
view and template code.

Another way to get an `ndb` context into a request, would be to use a `context
processor`, but those are functions called for every request, which means we
would need to initialize the client and context on each request, or find
another way to initialize and get the initial context.

Note that the above code, like other `ndb` code, assumes the presence of the
`GOOGLE_APPLCATION_CREDENTIALS` environment variable when the client is
created. See Django documentation for details on setting up the environment.