Support for well-known datatypes like date/time #1098

jclark · 2022-05-14T04:22:13Z

Ballerina currently does not handle data types like date/time well. These data types have a conventional string syntax, but they also have a higher-level semantic that can be represented by a Ballerina value that is not just a string. For example, date-time has a conventional string syntax (e.g. RFC 3339/ISO 8601). But the semantics of date-time would be better represented by numbers e.g. (number of units of time from some epoch for a timestamp, or triple of year/month/day integers for date). In Ballerina currently, we have to choose between two alternatives, neither of which are completely satisfactory:

we can use a string, but then we have something at the wrong semantic level
we can use a Ballerina value that captures the semantic, but then we lose the string syntax, which means automatic data-binding with toJson/fromJsonWithType does not work.

For data-types built into Ballerina (e.g. decimal and xml) we do not have this problem. We have the right semantics and we have the string syntax. It should be possible to do this for other data types also. For example, it should be possible to have for example a Timestamp data type that:

belongs to anydata
gets converted to/from a string in RFC 3339 format by toJson/fromJsonWithType (similar to the xml type)
is immutable and has no storage identity (like decimal or string)
is semantically a Timestamp not a string
1. it is not == to any string/list/mapping
2. any Timestamp value is guaranteed to be a valid timestamp
3. supports operations (ideally using method call syntax) at the semantic level of a timestamp
can have an implementation representation as a number or pair of numbers

There are quite a number of other common data types that are like this.

date/time
- timestamp
- date
- time
- duration
UUID
IP address
- v4
- v6
hostname
email address
URL
base64 or base16 binary data

None of these are specific to a particular program: they all fit into the concept of anydata.

JSON schema handles this by allowing an assertion that a string has a specific format. (In theory, JSON schema allows this for values other than strings, but all the formats it defines are for strings.) Protocol Buffers have the concept of a well-known type.

One solution to this would be to have the language include a separate basic type for each of these. But that wouldn't be a great solution: it should be possible to evolve these data types independently of the language specification. Also it should not be necessary to include these in the language. We can define each of the data types in terms of concepts we already have

a name for the type
a data structure being represented (e.g. for date: year, month, day); we will call this the value type
a specialized string syntax for representing the data (e.g. for date it's YYYY-MM-DD)
a mapping between the string syntax and the data structure
constraints of when an instance of the data structure is valid (e.g. number of days in a month); these go beyond the grammar for the syntax

The goal then is to devise a language feature that we can use to add a data type that works very similarly to a built-in data type, without needing to add something to the language specification for each such data type. If one of these data types was a basic type, then it would

be a subtype of anydata
not be a subtype of json
be immutable
be transformed into a string by toJson
be converted from a string by fromJsonWithType
support == and !=
not have storage identity; either
- === means ==, or
- === also considers another aspect of the data (e.g. precision for decimal or timezone for timestamp)
could support relational operators (< <= > >=), when that makes sense
would have a literal syntax (probably using backticks)
would allow const values
can be round-tripped by future fromBalString and toBalString functions

So, for example, if data:Timestamp referred to a timestamp type, then

type LogEntry record {|
   string message;
   data:Timestamp timestamp;
|};

json j = { message: "An error occurred", timestamp: "2022-05-14T11:20-07:00" };
LogEntry entry = check j.fromJsonWithType();
json j2 = entry.toJson();

The text was updated successfully, but these errors were encountered:

jclark · 2022-06-17T04:18:56Z

YAML has the concept of scalar nodes, which are defined as "an opaque datum that can be presented as a series of zero or more Unicode characters". Scalar nodes (like other kinds of node) can have a tag. "Scalar tags must also provide a mechanism for converting formatted content to a canonical form for supporting equality testing."

jclark · 2022-07-01T00:58:18Z

The concept for the language feature is to introduce a new basic type, called string-formatted data, or sdata, for representing data that is conventionally represented in a specialized string format. The sdata basic type is readonly, and is included in anydata but not json. Like the simple data types and string, sdata values do not have storage identity.

The sdata basic type is divided into named subtypes, one for each string format. The semantics of each named subtype is defined in terms of an underlying value type, which is a subtype of anydata, together with conversion operations between that underlying type and its string format. For example, a timestamp data type might be defined with a value type of [int, decimal] (with the same semantics as time:Utc), with conversion operations that convert to RFC 3339 string format (a subset of ISO 8601).

A program constructs a literal value of type sdata by using the subtype name followed by the string representation in backticks.

The language specification defines

the contents and the semantics of the definition
how the definition is used to implement the language-defined operations on the type.

Definitions can be provided either by

the language: in this case, the definition is part of the language specification
the platform (via the standard library): in this case, the definition is supplied by the platform using a mechanism that is internal to the platform, rather than by the language specification.

There is no mechanism for definitions to be provided from outside the platform. The platform will only define subtypes that are widely interoperable. This preserves the program-independent aspect of anydata.

jclark · 2022-07-01T01:02:49Z

The definition provides the following information:

a tag name: this is an unqualified identifier, restricted in the same way as a module name), which uniquely identifies the format and the named subtype; a value of a type with a tag f can be constructed using f with backticks e.g. f`d`, where d is the string representation of the value in that format
a type name; this is a Ballerina identifier, which follow the conventions for a type that is not a keyword (i.e. CamelCase); these are also unique for each named subtype
the underlying value type, which is subtype of anydata; operations on the named subtype can be expressed in terms of operations on the underlying value type
primitive functions that operate on the underlying value type; some of these need to be available at compile-time as well as runtime:
- a toString function to convert the underlying value to a string
- a parse function that creates the underlying value from the string
- a validate function that tests the validity of an underlying value
- a function to allow templates with insertions
a module: this is a normal Ballerina module that provides publicly-accessible functions that can be applied to values of the sdata subtype

jclark · 2022-07-01T01:10:34Z

sdata values support the following operations:

s1 == s2 is true if s1and s1 belong to the same sdata named subtype and their underlying values are ==
=== does not use storage identity (see below for details)
ordering uses ordering of the underlying value
backtick construction using the tag name
- when there are no insertions, this is the same as calling at compile-time the function to convert from a string, and giving a compile error if the function returns an error
- this is be a const expression
- these expressions would be allowed as a type descriptor, so that it is possible to have singleton subtypes of sdata
- with insertions, it's more complicated (see below)
ToString in expression style will return a backtick expression
ToString in informal style will just return the string representation
toJson will convert sdata values to strings
fromJsonWithType will convert strings to sdata values
method call syntax can be used; the method name will be looked up first as a function in the named subtype's module

jclark · 2022-07-01T01:13:33Z

There is a langlib module lang.sdata that is the langlib module for this new sdata basic type, which provides the following:

a definition of sdata:Any as a type that includes the whole sdata type (again using using the @builtinSubtype annotation)
functions that operate on arbitrary values of type sdata e.g.
- a function to return the tag name of the value
- a function to get the underlying value as readonly & anydata

User programs would not typically need to import the lang.sdata module (just as they do not need to import the lang.value type).

When method call syntax is used for an sdata value, the function is searched for in order in the following modules (this is similar to what happens for existing basic types):

the module for the type
the lang.sdata module
the lang.value module

jclark · 2022-07-01T01:15:12Z

The standard library provides a ballerina/data module. The data prefix is predeclared to refer to ballerina/data. For every named subtype with type name T (both platform-defined and language-defined), the ballerina/data module provides a public definition of a type T that refers to the named type. Thus a program can refer to any sdata named type using a qualified identifier of the form data:T, without needing any import.

The module for a standard library defined type with tag t is ballerina/data.t. This is a normal Ballerina module. The only difference is that another module can use method call syntax to make calls to functions in this module without having imported it (as with langlib modules). Each of these modules defines a standard set of types and primitive functions:

a type Value that refers to the underlying anydata value for data:T
a function fromValue(Value) returns T|error
a function fromString(string) returns T|error
a function parse(string) returns readonly & Value|error
a function validate(Value) returns Value|error
a function value(data:T) returns readonly & Value (this is a more precisely typed version of the function in lang.sdata)

It then also provides type-specific functions that can be implemented in terms of these primitive functions; each of these will usually take data:T as its first argument so that it can be called using method call syntax.

The mechanism that the ballerina/data and ballerina/data.t modules use to provide these definitions in ballerina/data depends on the internal mechanism the platform uses to define the named subtypes.

jclark · 2022-07-01T03:28:02Z

Currently we have two kinds of equality:

equality: the == operator uses this
exact equality: the === operator uses this

There are two differences between equality and exact equality:

for structures (where the basic type is mutable), equality uses the current state of the structure (what the structure contains), whereas exact equality uses the storage identity of the structure
for some simple values, === makes finer distinctions than ==
1. for float +0 and -0 are == but not ===
2. for decimal, === considers precision whereas == just considers the mathematical values

When the unpacked representation of an sdata value contains a decimal (which it probably will for types involving time), then === for the sdata value needs to consider the precision (because values that are === should be indistinguishable) but not the storage identity. This means we need another kind of equality, which

for structures, is like == in that it considers what the structure contains not its storage identity
for simple values, is like ===

Let's call this precise equality. Then

for sdata (and all simple values and string), precise equality and exact equality are the same thing
for sdata, exact equality and precise equality are defined as precise equality on the underlying value type

jclark · 2022-10-08T03:30:20Z

Most of this has been done as part of adding #1132. We are calling these things tagged data type.

Compared to what was described earlier, we haven't yet needed to expose the value data structure.

jclark added this to the Swan Lake Update 3 milestone May 14, 2022

jclark self-assigned this May 14, 2022

This was referenced Jun 24, 2022

Allow type descriptor to determine construction of value from RawTemplate #1131

Open

Add regular expression type to language #1132

Closed

jclark modified the milestones: Swan Lake Update 3, Swan Lake updates Jul 26, 2022

jclark modified the milestones: Swan Lake updates, Swan Lake Update 4 Oct 8, 2022

jclark modified the milestones: 2023R1, 2013R2 Apr 25, 2023

jclark mentioned this issue Oct 25, 2023

Add url tagged data type #1270

Open

poorna2152 mentioned this issue Jul 19, 2024

[New Feature]: Add date/time type to language ballerina-platform/ballerina-lang#43128

Closed

anupama-pathirage added Type/NewFeature Area/Lang Relates to the Ballerina language specification labels Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for well-known datatypes like date/time #1098

Support for well-known datatypes like date/time #1098

jclark commented May 14, 2022 •

edited

Loading

jclark commented Jun 17, 2022 •

edited

Loading

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Oct 8, 2022 •

edited

Loading

Support for well-known datatypes like date/time #1098

Support for well-known datatypes like date/time #1098

Comments

jclark commented May 14, 2022 • edited Loading

jclark commented Jun 17, 2022 • edited Loading

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022 • edited Loading

jclark commented Jul 1, 2022 • edited Loading

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022

jclark commented Jul 1, 2022 • edited Loading

jclark commented Oct 8, 2022 • edited Loading

jclark commented May 14, 2022 •

edited

Loading

jclark commented Jun 17, 2022 •

edited

Loading

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Jul 1, 2022 •

edited

Loading

jclark commented Oct 8, 2022 •

edited

Loading