Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] how to optimize funnels query? #1836

Closed
lamberken opened this issue Jan 30, 2018 · 6 comments
Closed

[Question] how to optimize funnels query? #1836

lamberken opened this issue Jan 30, 2018 · 6 comments

Comments

@lamberken
Copy link
Contributor

lamberken commented Jan 30, 2018

hi, I need to use clickhouse to funnels query

info

  • the table trajectory_trackless_complex has 2894082787 rows
  • pageid: 2538 --> 3242 --> 1682, ( homepage --> listpage --> detailpage )

query sql

SELECT 
    sum(step1_time != 0) AS home, 
    sum(step2_time != 0) AS list, 
    sum(step3_time != 0) AS detail
FROM 
(
    SELECT 
        loginkey, 
        groupArray(pageid) AS events, 
        groupArray(createtime) AS events_times, 
        arrayFilter((time, name) -> (name = 2538), events_times, events)[1] AS step1_time, 
        arrayFilter((time, name) -> ((name = 3242) AND (time >= step1_time) AND (step1_time != 0)), events_times, events)[1] AS step2_time, 
        arrayFilter((time, name) -> ((name = 1682) AND (time >= step2_time) AND (step2_time != 0)), events_times, events)[1] AS step3_time
    FROM 
    (
        SELECT 
            pageid, 
            createtime, 
            loginkey
        FROM datacenter.trajectory_trackless_complex 
        WHERE pageid IN (2538, 3242, 1682)
        ORDER BY createtime ASC
    ) 
    GROUP BY loginkey
) 

Execute result

begining

it seems run very fast
image

after few seconds

it hungs

end

it uses 377sec
image


Question

how to optimize the query ?
how to make it run faster ?
thank you very match

@lamberken
Copy link
Contributor Author

@ztlpn can you give me some advice?

@shangshujie365
Copy link
Contributor

udaf

@sundy-li
Copy link
Contributor

@Jaxvk Uadf, try this

@lamberken
Copy link
Contributor Author

@sundy-li 如果能通过clickhouse语法可以直接实现就很好了, 易观比赛加了一些限定条件可以快速得出结果。我发现它总是卡在一台机器上 98%,这个阶段如果可以并行使用多个机器运行就好了

@alexey-milovidov
Copy link
Member

Have you tried the sequenceMatch aggregate function?

@lamberken
Copy link
Contributor Author

@alexey-milovidov yes, it can't not work

Exception: Pattern application proves too difficult, exceeding max iterations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants