Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LBA in Julia / Turing: how to specify the input as vectors #1

Open
DominiqueMakowski opened this issue Jun 14, 2023 · 6 comments
Open

Comments

@DominiqueMakowski
Copy link
Owner

@kiante-fernandez I thought I'd first ask you as it's probably a basic issue of Turing specification.

I am testing this tutorial, and try to modify it to first make it as if I had the data stored in a dataframe. And I thought, similarly to other Turing models, to modify the input to not be a tuple of data but two vectors.

I tried the following:

using Turing
using SequentialSamplingModels
using Random
using DataFrames
using LinearAlgebra

# Generate some data
Random.seed!(254)
data1 = DataFrame(rand(LBA=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))
data1[!, :condition] = repeat(["A"], nrow(data1))
data2 = DataFrame(rand(LBA=[1.0, 1.5], A=0.7, k=0.2, τ=0.3), 500))
data2[!, :condition] = repeat(["B"], nrow(data2))

data = vcat(data1, data2)

# Specify model
@model model(choice, rt) = begin
    min_rt = minimum(rt)  # Get min RT
    ν ~ MvNormal(zeros(2), I * 2)
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    τ ~ Uniform(0.0, min_rt)
    data ~ LBA(; ν, A, k, τ)
end


sample(model(data[!, :choice], data[!, :rt]), Prior(), 5000)
ERROR: MethodError: no method matching iterate(::LBA{Vector{Float64}, Float64, Float64, Float64})

I suspect this is because LBA() returns a tuple. But I'm not sure what to do here, I tried things like having:

(choice, rt) ~ LBA(; ν, A, k, τ) 

but it doesn't seem to be the solution. Any thoughts?

@kiante-fernandez
Copy link

One solution is to parse the DataFrame within the model specification function to create a NamedTuple. This also makes calling sample a bit cleaner.

@model function model(data; n_col = size(unique(data.condition))[1])
    data = (choice = data.choice, rt = data.rt)
    min_rt = minimum(data[2])
    ν ~ filldist(MvNormal(zeros(2), I * 2), n_col)
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    τ ~ Uniform(0.0, min_rt)
    data ~ LBA(; τ, A, k, ν)
    return (; data, ν, A, k, τ)
end

sample(model(data), Prior(), 5000)

Also, note that filldist is used to draw the two drifts from MVN for each of the two conditions.

Now, I know that does not really answer your question. But looking into this generated some thoughts/comments

Two things: when using Turing, I have not encountered an instance where the distribution returns a NamedTuple of size two (both choice and RT) and you would supply two (e.g. choice, rt ~ ...)

Secondly, while I have little experience creating custom distributions that can be used in Turing, the LBA.jl file seems to be missing some required implementations of a set of methods, some of which are not present from my reading. Doing so can lead to similar errors see: link to relevant thread.

let me know what you think.

@DominiqueMakowski
Copy link
Owner Author

Thanks!

the LBA.jl file seems to be missing some required implementations of a set of methods

Couldn't we open an issue/PR in SequentialSamplingModels (SSM) to add potentially missing methods? Do you have in mind some in particular?

@kiante-fernandez
Copy link

Yeah! Sounds like we are on the same page. I noticed there is a suggested set in the Distribution.jl documentation. I'll open an issue for it.

@DominiqueMakowski
Copy link
Owner Author

DominiqueMakowski commented Jun 17, 2023

I think it's going somewhere, the sampling works, but there's one critical piece of the puzzle missing: How to specify "predictors" in the model. Say I have the following base model:

using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra
using DataFrames
using StatsPlots
using StatsModels

@model function model_lba(data)
    data = (choice=data.choice, rt=data.rt)
    min_rt = minimum(data[2])

    # Priors
    drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    tau ~ Uniform(0.0, min_rt)

    # Likelihood
    data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
    return (; data, drift, A, k, tau)
end

(note that I'm not using symbols for parameters name because I expect this could cause issues when the output is exported say in R)

And my data is like that:

# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
data1 = DataFrame(rand(LBA=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))
data1[!, :condition] = repeat(["A"], nrow(data1))
data2 = DataFrame(rand(LBA=[1.0, 1.5], A=0.7, k=0.2, τ=0.3), 500))
data2[!, :condition] = repeat(["B"], nrow(data2))

data = vcat(data1, data2)

My current intuition is to have a first data-preparation step to get the input data as a model-matrix using StatsModels:

# Get input data
f = @formula(rt ~ 1 + condition)
f = apply_schema(f, schema(f, data))

X = modelmatrix(f, data)
1000×2 Matrix{Float64}:
 1.0  0.0
 1.0  0.0
 1.0  0.0
 ⋮
 1.0  1.0
 1.0  1.0
 1.0  1.0
_, predictors = coefnames(f)
2-element Vector{String}:
 "(Intercept)"
 "condition: B"

The question is: how to model for instance different drift rates according to this predictor data 🤔
I can see how it is done for a regular linear model, but here I am not sure at all, do we need to specify a linear formula in the LBA() likelihood call?

# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, 
           ν=intercept .+ x * drift)

or something like that?

@kiante-fernandez
Copy link

Based on this issue (itsdfish/SequentialSamplingModels.jl#27), I realize I never pressed comment here.

For each parameter, to model trial-level effects, you would indeed want to specify the parameter as a linear combination. This allows you to interpret the DDM parameters, such as mean functions, within a regression-like framework.

For example, in the case of drift:
ν = β_0 + x * β_1.
Now you have a drift intercept, which characterizes a stimulus-independent average rate of evidence accumulation constant added, referred to as the drift criterion or drift bias.

However, in your case, that would not be applicable because you have a reference treatment.

So overall I think folks will still have to specify the model for now.

@DominiqueMakowski
Copy link
Owner Author

I think that LBA, because of its double-drift parameter, is a bit more tricky to set:

using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra
using Distributions
using DataFrames
using StatsPlots
using StatsModels


# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
data1 = DataFrame(rand(LBA=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))
data1[!, :condition] = repeat(["A"], nrow(data1))
data2 = DataFrame(rand(LBA=[1.0, 1.5], A=0.7, k=0.2, τ=0.3), 500))
data2[!, :condition] = repeat(["B"], nrow(data2))

data = vcat(data1, data2)


# Format input data
f = @formula(rt ~ 1 + condition)
f = apply_schema(f, schema(f, data))

_, predictors = coefnames(f)
X = modelmatrix(f, data)


# Model
@model function model_lba(data; intercept=nothing, condition=nothing, min_rt=minimum(data.rt))

    # Priors for auxiliary parameters
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    tau ~ Uniform(0.0, min_rt)

    # Priors over coefficients
    beta_intercept ~ Normal(0, 1)
    beta_condition ~ Normal(0, 0.5)

    # drift rate
    drift = beta_intercept + beta_condition * condition
    # drift ~ filldist(MvNormal(zeros(2), I * 2), 2)

    # Likelihood
    data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
    return (; data, drift, A, k, tau)
end

Should the priors over coefficient be filldist(MvNormal)) as well? and if so, how to include that in a linear formula?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants