Recreate the graphic from the RStudio exercise from the week 2 lab.
Save the output in a file called mtcars.png
.
- Write an R script that generates an
sqlite3
database containing themtcars
data. - Write a Python script that reads in the data using
pandas
and makes the plot usingseaborn
- This "pipeline" will be written and carried out using
snakemake
. - The pipeline must be robust to change.
In other words, if you
touch
any of the inputs, then the work flow should restart from that point and regenerate the necessary outputs.touch
is a Unix command. If you are not familiar with it, Google it.
- You'll need to read the
snakemake
docs. - You'll have to figure out how to organize the "rules".
- A correct work flow will generate the final output file starting from a directory containing nothing other than the
Snakefile
and the oneR
and the onePython
script.
A correct work flow will only execute the necessary steps when a script/input file is "touched". In other words:
- If you
touch
your R script, the sqlite3 database and figure will be regenerated. - If you
touch
your sqlite3 database or the Python script, then the figure will be regenerated, but not the database.
Most of the steps you need are in the material from previous weeks.
You need to discover how to save a seaborn
plot to a file, though!
- How do you delete all output from a
snakemake
work flow? - How do you delete output from a single
snakemake
rule? - What is the citation for
snakemake
?
In a new repository:
-
The
Snakefile
-
The R and Python script.
-
mtcars.png
-
The
README.md
for the repo should display the image. -
The
README.md
should contain the answers to the questions listed above. -
The
README.md
should contain evidence thattouch
ing the various files does the right thing. You can copy-paste the shell output that happens after youtouch
and rerun the jobs. Place the output in a code fence in the README:``` Output ```
Do not use screen shots.
The other mechanics are the same. Link to the new work from your homework web page, etc..