Understanding the output

There is a list of calls that differ in number and results, what should I look at?

Indeed, there is a bit of a list:

Single bin, bin test
Single bin, aneuploidy test
Windowed, bin test
Windowed, aneuploidy test
Chromosome wide, aneuploidy test

And well, it depends on how securely you are looking at the data. In general, the list right after Windowed, aneuploidy test should give the most reliable results for aneuploidy calling, while the Windowed, bin test attempts to provide a list of the areas on the chromosomes that are actually aberrated. The Single bin methods are not very reliable as they are not sensitive enough for most samples, instead of combining the power of several bins, just a single bin is taken into account. Calls made are usually strongly deviating values, while most fetal aberrations are not deviating that much.
The last one, Chromosome wide, aneuploidy test appears to be way to sensitive for calling aneuploidy cases but has shown its use in a set of samples that had far too little fetal DNA: It somehow was able to point us to the right samples as their aberrated chromosomes showed up as more deviating values than usual using this approach. Do not, I repeat, NOT apply this test with a threshold of 3 to call aneuploid chromosomes, instead use it as a pointer to what chromosomes might be insufficiently deviating in other tests but need someone to just look at the data and determine whether an aneuploidy was missed.

Chromosome 19 seems aberrated!

That is just a downside of this method: chromosome 19 is a nuisance, it contains a lot of genes and because of that, it is relatively GC-rich. This causes the read frequency on this chromosome to behave differently from other chromosomes, giving WISECONDOR a very hard time to determine bins that behave alike. In general, calls on chromosome 19 can be ignored, a fetus is unlikely to survive an aberration on this chromosome.

There is always this warning about the Average Allowed Deviation, what does it mean?

This line: Average allowed deviation: XX.XX% WARNING: High value (>5%) calls are unreliable often pops up in even the better samples. It is basically a single value that tries to give an impression of how reliable the sample is by calculating how much the read frequency needs to deviate on average before a call is made. If we were to calculate the StandardDeviation for all tested bins (compared to the reference bins for each of them), triple it, determine the percentual difference in read depth this StandardDeviation equals to and average this value over all bins within the sample, we get this number. It was meant to tell whether the applied tests made sense compared to how much the sample deviates from the reference set in general.
Right now, the warning is a bit too sensitive, it provides a warning even if the results are perfectly fine. Do take care though, values of 14% and higher usually mean the sample does give WISECONDOR some trouble. A better value has yet to be determined.

But what if the reference set was messy?

The number increases if the tested samples read depth behaviour structurally differs from the reference set. A 'messy' reference set and a 'stable' sample, do not necessarily provide high numbers here, it just tells you how much the sample structurally differs from the tested set.
A high number may be caused by a very stable reference set and a messy sample though, as a stable set may provide wrong information to WISECONDOR: bins that appear to behave the same only behave the same within this stable set.

The plots are all the same size, it's hard to recognize chromosomes

True, during development I just wanted a quick overview of my results without wasting any space so all chromosomes have their lengths scaled. An alternative was proposed by S.Ghesquiere, which shows every chromosome by it's real length and additional, detailed plots for chromosomes 13 18 and 21 next to their cytobands. We are looking into this approach and may incorporate this in the future. In the meanwhile, you are free to fork this project and make a pull-request, all your input is welcome.

What do the lines in Z-Score plots actually show?

The deviation for any bin compared to its reference. The blue one shows the per bin tested deviations. To get a rough idea of the meaning of this, consider it this way: a high spike shows that that bin is strongly increased when compared to what it should be and what deviation is expected. A bin for which its reference set of bins has a lot of variation will therefore show a smaller spike than a bin that increased just as much but has a more stable set of reference bins. It is not a direct comparison, and it does not show the actual increase for any area, it shows how WISECONDOR looks at that area. Plots that show actual read frequencies for bins do not provide a lot of information as the small change in read depth caused by a fetal aberration often gets completely overruled by natural fluctuations in the read depth data.
The red line shows the windowed approach, which basically just combines a set of results shown in the blue line and determines how much this set deviates. A group of deviating bins shown in blue will therefore result in a strongly deviating red line, hence the visual correlation between the two.

I found a 10-20 Mb area using the sliding window, but the plots only show a huge narrow spike

That is considered an artifact. If it shows up in just one sample for that area, the pregnant woman is likely the cause of this:
If a maternal CNV is large enough to cover more than one bin and makes up for a relatively large part of the bins total covered area, it will appear as an aberrated area in WISECONDOR. The windowed method removes the highest bin from it's window but two subsequent spiking bins will leave the window with a strongly deviating value, which will then influence the total Z-Score for all bins close to it.
If the spike does show up in several samples, the reference set used may seem more stable in this area than the tested samples. This structural artifact can be removed by adding more samples to the reference set, allowing WISECONDOR to learn about the spikyness in this area.

If you run into issues, please create a ticket so I can take care of it.

If you have other troubles running WISECONDOR or any related questions, feel free to contact me through the e-mail adress on my GitHub page.