-
Notifications
You must be signed in to change notification settings - Fork 23
Galaxy101 2
Lets take a look at the history again:
You can see that this history contains all steps of our analysis. So by building this history we have actually created a complete record of our analysis with Galaxy preserving all parameter settings applied at every step. Wouldn't it be nice to just convert this history into a workflow that we'll be able to execute again and again? This can be done by clicking on the button and selecting Extract Workflow option:
The center pane will change as shown below and you will be able to choose which steps to include/exclude and how to name the newly created workflow. In this case I named it galaxy101-2015
:
once you click Create Workflow you will get the following message: "Workflow 'galaxy101-2015' created from current history. You can edit or run the workflow".
Let's click edit (if you click something else and the message in the center page disappears, you can always access all your workflows including the one you just created using the Workflow link on top of Galaxy interface). This will open Galaxy's workflow editor (to get this view I clicked the arrow at the lower left corner of the screen, which collapsed the tool pane of the Galaxy interface). It will allow you to examine and change settings of this workflow as shown below. Note that the box corresponding to the Select First tool is selected (highlighted with the blueish border) and you can see parameters of this tool on the right pane. This is how you can view and change parameters of all tools involved in the workflow:
Among multiple things you can do with workflows I will just mention one. When workflow is executed one is usually interested in the final product and not in the intermediate steps. These steps can be hidden by mousing over a small asterisk in the lower right corner of every tool:
Yet there is a catch. In a newly created workflow all steps are hidden by default and the default behavior of Galaxy is that if all steps of a given workflow are hidden, then nothing gets hidden in the history. This may be counterintuitive, but this is done to decrease the amount of clicking if you do want to hide some steps. So in our case if we want to hide all intermediate steps with the exception of the last one we will click that asterisk in last step of the workflow:
Once you do this the representation of the workflow in the bottom right corner of the editor will change with the last step becoming orange. This means that this is the only step, which will generate a dataset visible in the history:
Right now both inputs to the workflow look exactly the same. This is a problem as will be very confusing which input should be Exons and which should be SNPs:
One the image above you will see that the top input dataset (the one with the blue border) connects to the Join tool first, so it must correspond to the exon data. If you click on this box (in the image above it is already clicked on because it is outlined with the blue border) you will be able to rename the dataset in the right pane:
Then click on the second input dataset and rename it "Features" (this would make this workflow a bit more generic, which will be useful later in this tutorial):
Finally let's rename the workflow's output. For this:
- click on the last dataset (Compare two Queries)
- scroll down the rightmost pane and click on
- Type
Top Exons
in the Rename dataset text box:
What we are trying to do here is do design a generic workflow. This means that time to time you will need to change parameters within this workflow. For instance, in this tutorial we were selecting 5 exons containing the highest number of SNPs. But what if you need to select 10? Thus it makes sense to leave these types of parameters adjustable. To do this:
First, select a tool in which you want to set parameters at runtime (Select first
in this case):
Next, select parameter you would like to set at runtime. To do this just hover over the icon so it looks like this:
and click! Your parameter will now be set at runtime.
Now let's save the changes we've made by clicking and selecting Save:
Now that we have a workflow, let's do something grand like, for example, finding exons with the highest number of repetitive elements across the entire human genome.
First go back into analysis view by clicking Analyze Data on top of the Galaxy's interface. Now let's create a new history by clicking and selecting Create New:
Now let's get coding exons for the entire genome by going to Get Data -> UCSC Main and setting up parameters as shown below. Note that this time region
radio button is set to genome:
Click get output and you will get the next page (if it looks different from the image below, go back and make sure output format
is set to BED - browser extensible format):
Choose Coding exons and click Send query to Galaxy.
Go again to Get Data -> UCSC Main and make sure the following settings are selected (in particular group
= Repeats and track
= RepeatMasker):
Click get output and you will get the next page (if it looks different from the image below, go back and make sure output format
is set to BED - browser extensible format):
Select Whole gene and click Send Query to Galaxy.
At this point you will have two items in your history - one with exons and one with repeats. These datasets are large (especially repeats) and it will take some time for them to become green. Luckily you do not have to wait as Galaxy will automatically start jobs once uploads have ended. So nothing stops us from starting the workflow we have created. First, click on the Workflow link at the top of Galaxy interface, mouse over galaxy101-2015, click, and select Run. Center pane will change to allow you launching the workflow. Select appropriate datasets for Repeats
and Exon
inputs as shown below. Now scroll to Step 6 and will see that we can set up Select first
parameter at Runtime (meaning Now!). So lets put 20
in there (or anything else you want) and scroll further down to click to see this:
Once workflow has started you will initially be able to see all its steps. Note that you are joining all exons with all repeats, so naturally this will take some time:
As we mentioned above this will take some time, so go get coffee. At last you will see this:
The two histories and the workflow described in this page are accessible directly from this page below:
- History Galaxy 101 (2015)
- History Exons vs. Repeats
- Workflow Galaxy 101-2015
From there you can import histories and workflows to make them your own. For example, to import Galaxy 101 (2015) history simply click this link and select Import history
link:
...you need to complain. Use Galaxy's BioStar Channel to do this.