Simplifying the chain type definition with named tuples #8

bicycle1885 · 2024-09-25T14:45:55Z

I want to suggest a simpler way to define the chain types. Instead of using defining separated types for annotated and not annotated chains, we can use a named tuple to define an extensible chain type. This way, we can avoid the complexity of handling two different types and thus simplify the interface and implementation.

The main points are:

Define a chain type having a named tuple of properties as a field so that we can add new properties without changing the type definition.
Make the chain type immutable to avoid unnecessary memory allocations.
Define types for chain and residue properties to allow for different types of properties.

First, we define the property tyepss for the chain type:

abstract type AbstractProperty end

# Chain-level properties
struct ChainProperty{T} <: AbstractProperty
    value::T
end

Base.getindex(prop::ChainProperty) = prop.value

# Residue-level properties (the last dimension of the array corresponds to residues)
struct ResidueProperty{T, A <: AbstractArray{T}} <: AbstractProperty
    values::A
end

Base.getindex(prop::ResidueProperty) = prop.values
Base.getindex(prop::ResidueProperty, i::AbstractVector) = selectdim(prop.values, ndims(prop.values), i)

Then, we define the chain type as a named tuple:

const NamedProperties{names} = NamedTuple{names, <: Tuple{Vararg{AbstractProperty}}}

struct Chain{T, Ps <: NamedProperties}
    id::String
    sequence::String
    backbone::Array{T, 3}
    numbering::Vector{Int64}
    atoms::Vector{Vector{Atom{T}}}
    properties::Ps
end

Base.getproperty(chain::Chain, name::Symbol) =
    name == :id ? getfield(chain, :id) :
    name == :sequence ? getfield(chain, :sequence) :
    name == :backbone ? getfield(chain, :backbone) :
    name == :numbering ? getfield(chain, :numbering) :
    name == :atoms ? getfield(chain, :atoms) :
    name == :properties ? getfield(chain, :properties) :
    getfield(getfield(chain, :properties), name)

function Base.getindex(chain::Chain, i::AbstractVector)
    properties = map(p -> p isa ChainProperty ? ChainProperty(getindex(p)) : ResidueProperty(getindex(p, i)), chain.properties)
    Chain(chain.id, chain.sequence[i], chain.backbone[:,:,i], chain.numbering[i], chain.atoms[i], properties)
end

annotate(chain::Chain; annotations...) =
    Chain(chain.id, chain.sequence, chain.backbone, chain.numbering, chain.atoms, merge(chain.properties, annotations))

This works as follows:

# Create a chain with random data
chain = Chain(
    "myid",
    "ABCDEFGHIJ",
    randn(3, 3, 10),
    [1:10;],
    [Atom{Float64}[] for _ in 1:10],
    (score = ChainProperty(0.1), others = ResidueProperty(randn(10))),
)

# Select the first 5 residues
x = chain[1:5]

# Access some properties
x.id, x.sequence, x.score, x.others

# Annotate the chain with a new property
annotate(chain, label = ChainProperty("foo bar"))

If type stability is not a concern, we can remove the type parameter of the properties in the Chain type.

The text was updated successfully, but these errors were encountered:

AntonOresten mentioned this issue Sep 28, 2024

Simplify properties #9

Merged

AntonOresten closed this as completed in #9 Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplifying the chain type definition with named tuples #8

Simplifying the chain type definition with named tuples #8

bicycle1885 commented Sep 25, 2024

Simplifying the chain type definition with named tuples #8

Simplifying the chain type definition with named tuples #8

Comments

bicycle1885 commented Sep 25, 2024