data-science-homework

Assessment for Data Science roles.

Expected completion time: 3 hours. Note that the time next to each question is not a limit but rather a suggestion; feel free to spend more time if you need it.

How do I take this assessment?

Clone this repo onto your local machine.
git clone https://github.com/tilleyand/data-science-homework.git
Answer the questions below by filling out the files in the solutions folder.
Zip your solutions folder.
zip -r solutions.zip solutions
Email solutions.zip to [email protected] (or your recruiter).
You'll receive an email with your results.

*Important Note: Questions 4 and 5 can both be completed in either R or Python. If you fill out both the R and Python function files for a question, we will score both and give you the best result.

What topics are covered?

Mathematics / Problem Solving
Probability
SQL
Data Science (Modeling, Machine Learning)
Applied Mathematics

Question 1. Positive Fractions (Math / Problem Solving) [5 minutes]

The numerator and denominator of Juan’s fraction are positive integers whose sum is 2011. The value of this fraction is less than 1 / 3. Find the greatest such fraction.

Write your answer in question_1.txt in the following format:
numerator
denominator

Question 2. Counting Cards (Probability) [10 minutes]

A standard deck of 52 cards has 12 "face cards" (Jack, Queen, King of each suit). Suppose you're playing a game where you're given six cards. After the dealer gives you the first five cards (which are now removed from the deck), he offers you the chance to bet on the outcome of the sixth card: if it's a face card, you quadruple your money; otherwise, you lose your money. For example, if you bet $1 and win you will end with $4 (your $1 + winnings of $3), but if you bet $1 and lose you will end with $0 (a loss of your $1).

What is the probability that taking this bet will be in your favor?

Write your answer (rounded to 5 decimal points) in question_2.txt.

Hint: The answer is not simply the probability that the sixth card is a face card. Our question is equivalent to asking: "What is the probability that you will be dealt a 5-card hand such that taking the bet is in your favor?"

Question 3. Order Query (SQL) [15 minutes]

Write a SQL query (using SQLite syntax - http://www.sqlite.org/index.html) on the user_orders table (data/user_orders.csv) that returns the following table:

user first_29_day_orders

which contains one row for every user along with how many orders they had in their first 29 days.

Write your query in the format of a SELECT statement.

Note: An order is considered to be in the user's first 29 days if it happened before OR on first_date + days(28). E.g. if the user's first order was on 1/1/2018, an order on 1/29/18 would still count as a first_29_day_order but an order on 1/30/18 would not.

Write your query in question_3.sql.

Question 4. Forecasting (Data Science) [60 minutes]

Write a function in R or Python that takes in a dataset (.csv file; see data/hourly_volume.csv for sample data) and a number of days forward, and generates predictions for hourly order volume. Feel free to use any existing libraries/packages.

Notes: 1) Forecasts should be produced starting at the end of the supplied dataset. E.g. if the dataset contains data up to 2018-01-31 and days_forward is 2, then your function should return predictions for every date in [2018-02-01 00:00:00, 2018-02-02 23:00:00]. 2) Your function should return a data table with only two columns: the first one referring to order hour (formatted like example dates in this comment) and the second one referring to the predicted order volume.

Write your function in question_4.r or question_4.py.

Question 5. Referral Chain (Applied Mathematics) [90 minutes]

Suppose your business has a network of users who are all either Active (1) or Inactive (0) (we call this the user's state). Each user was referred by an existing user, except for the first user. At any given time, each user's likelihood of being Active is a function of their referrer's state along with some threshold t_i.

Specifically, we can define the state of each user with the following equation:

(docs/equation 1.jpg),

where user 1 referred user 2, user 2 referred user 3, user 3 referred user 4, etc.

In this system, each user except for user N has referred exactly one person.

While there are N users, the binary state vector s has cardinality (N + 1) and is indexed at 0. The threshold vector t is a random variable distributed uniformly with support on [0.5, 1].

Write a function which takes in N (the number of users) and s_N (the sample expectation for the last user in the chain; this can also be interpreted as the probability the last user in the chain is Active) and returns the maximum likelihood estimate of the state probability of s_0 (i.e. the probability that s_0 is 1). Please round your answer to 2 decimal places.

*Important Note: In order to receive a score, your function must finish running in under 30 minutes for N <= 5.

Write your function in question_5.r or question_5.py.

If anything on this assessment is unclear, please email questions to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
docs		docs
solutions		solutions
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-science-homework

About

Releases

Packages

Languages

License

tilleyand/data-science-homework

Folders and files

Latest commit

History

Repository files navigation

data-science-homework

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages