-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.0 feature request: Require that sessions are JSON serializable #2709
Comments
If JSON supported datetime this would be a no brainer for me. |
@mmerickel there's gotta be a way to solve that problem though, with custom serializer/deserializer perhaps? @ztane I don't want to add a 3rd party library to Pyramid core. |
@bertjwregeer deserialization is where things get shady. On the serialization side you basically end up marshaling a custom syntax into a string, or go even further and instead of a straight json dump you instead iterate through and build out private structures containing the type information. This is all possible but then there's a fine line between this and basically re-inventing pickle. I'm not ruling anything out right now but I think it'd ideally be something general purpose, pluggable with custom types and not in pyramid's core. This way people could use it in their own session factories and they could add serializers for types like datetime and any other custom objects. |
I was suggesting we support only the lowest common denominator. Basically If you want to have an I was simply asking for the default to be changed. |
Yeah... that's not actually helpful for creating a session ecosystem though. Addons need to be able to use the session without worrying about the specific implementation. That's why pickle is so great right now - addons don't need to care about the serialization format and they can throw datetimes in there and get them back. If we switch to JSON in the way you suggested, they can no longer know what they will get back for any non-primitive types. The addon will need to ask itself... can I even store this object? What will I get back, if it even takes it? That's not an improvement. |
IMHO storing anything but strings or small bits of data in the session is something that shouldn't be encouraged anyway. Maybe there should be a way for the session to be interrogated before app startup, so that an extension can ask it what the serializer is, and fail out if it doesn't provide certain guarantees. This way for most cases deployers can be safer (and not possible allow an RCE if their secret is leaked), yet if required they can switch to a different serializer to support their add-ons and the ecosystem around it. |
I switched from JSON to Msgpack a while back to minimize the size of server-side sessions in Redis (Pickle is FAST for builtins and standard-lib types, but the payload consumes more bytes than desired when using a LRU cache memory store). The standard msgpack and json libs have similar approaches and implementations to serialization, and I will underscore mmerickel's concern that compatibility is an issue. The concern isn't just for custom data types which might be stored in a session, but also
In terms of If you were to implement something / require a restriction... it would make sense to have a default JSON serializer that can be over-ridden by the user, and/or support an API where 3rd party plugins can register (de)serialization formats for non-standard types. You really don't want to support serialization for any unused object types. (FWIW, I found encoding datetime&date as epoch seconds was the fastest option for deserialization.) |
FWIW Django switched the default serializer to JSON (but kept the ability to set a Pickle serializer) and afaik apps just dealt with it and it hasn't caused any problems. |
FTR I don't have a problem making this switch. I think there are several cons regarding json's limited type support but I think overall it's a win. |
FWIW It might also make sense to use msgpack and use custom types to represent common Python datatypes to allow them to still be serialized. This works a bit better in msgpack than it does in JSON because msgpack has type information in it's serialization format so you can register your own types (represented by an integer) and when it deserializes you get that integer + the data back out. That avoids needing to do things like guess about datetimes or what have you. Something like: import enum
import datetime
import struct
import msgpack
class Types(enum.IntEnum):
datetime = 1
def default(obj):
if isinstance(obj, datetime.datetime):
return msgpack.ExtType(
Types.datetime.value,
struct.pack(">I", int(obj.timestamp())),
)
raise TypeError("Unknown type: %r" % (obj,))
def ext_hook(code, data):
type_ = Types(code)
if type_ == Types.datetime:
return datetime.datetime.fromtimestamp(struct.unpack(">I", data)[0])
return msgpack.ExtType(code, data)
data = {"foobar": datetime.datetime.now()}
packed = msgpack.packb(data, default=default, use_bin_type=True)
unpacked = msgpack.unpackb(packed, ext_hook=ext_hook, encoding="utf-8")
# \/ This actually fails because milliseconds don't roundtrip via timestamp
# but you get the idea.
assert data == unpacked |
Is |
fwiw, it's much faster and more compact than |
I am a -1 on setting the default to something that requires installing another dependency. I have no problem with making it easily switchable (including back to pickle). |
I haven't really taken any time to evaluate msgpack but I will say that if it's a pure-python (or well-supported), trustable dependency then I'm not against it. |
msgpack has a C accelerator but it falls back to pure Python when that can't compile (or on PyPy). |
I love msgpack, it is awesome especially if you need to interface with C code, or if you need to dump shtloads of data. However I don't think it adds anything over JSON in this case, especially without speed-ups. If serialization of complex object structures were needed, then I'd rather have https://pypi.python.org/pypi/cbor2. Supports cycles. Pure python. |
FWIW, I am +1 on breaking bw compat in 2.X, moving to an interface that mandates that session data be JSON serializable. We could provide a PickleSession bw compat shim and instructions on how it would be used and point to these instructions when a deserialization or serialization error is raised within the default cookie serializer. |
I like @mmerickel's initial idea of defaulting to JSON but being pluggable. this would allow people to switch to pickle/msgpack/etc or even specify a json encoder that can handle datetime. while i understand his later concern over 'what happens if a 3rd party plugin needs to support X/Y/Z' -- that's what unittests are for, and the plugin can recommend encoder/decoders. |
Yeah it sounds like Pyramid would define JSON data types as the target and people are free to use something more strict (like pickle). The downside for any addon / app that wanted richer types (like what pickle provided) would be that they may need to document more clearly the types of objects they store in the session and the session impl would then need to try to accommodate that but it'd be "not pyramid's problem". |
@jvanasco what you describe is pretty much already the case, it's just that technically a session implementation currently must support the serialization of all pickleable objects to meet the API. In 2.0, it will be changed such that it must support only the serialization of all JSON-serializable objects, and specialized sessioning implementations can abide by less strict rules, as Michael said. |
See #3353 for how I deprecated pickle-based sessions in pyramid 1.10 in order to provide users with some info about the upcoming change in 2.0. |
Currently the
ISession
interface requires that that keys and values of a session must be pickleable.My proposal is to drop that requirement and simply state that all keys and values should be JSON serializeable. We should still give users the option to use the pickle serializer, but pick a more safe default.
Cookie based sessions right now by default use a PickleSerializer and then the cookie is protected using a secret and HMAC hashes. However if an attacker was able to get the the secret, they could potentially use the unpickle methods to execute code in the context of the running Python process.
The text was updated successfully, but these errors were encountered: