-
Notifications
You must be signed in to change notification settings - Fork 46
Resampling up a piddle
Imagine a data structure like so
my $in = [
[1 0 8]
[6 3 5]
[3 0 5]
[2 4 2]
];
Given
my $n = 5;
I want to convert it into the following piddle
my $out = [
[1 1 1 1 1 0 0 0 0 0 8 8 8 8 8]
[1 1 1 1 1 0 0 0 0 0 8 8 8 8 8]
[1 1 1 1 1 0 0 0 0 0 8 8 8 8 8]
[1 1 1 1 1 0 0 0 0 0 8 8 8 8 8]
[1 1 1 1 1 0 0 0 0 0 8 8 8 8 8]
[6 6 6 6 6 3 3 3 3 3 5 5 5 5 5]
[6 6 6 6 6 3 3 3 3 3 5 5 5 5 5]
[6 6 6 6 6 3 3 3 3 3 5 5 5 5 5]
[6 6 6 6 6 3 3 3 3 3 5 5 5 5 5]
[6 6 6 6 6 3 3 3 3 3 5 5 5 5 5]
[3 3 3 3 3 0 0 0 0 0 5 5 5 5 5]
[3 3 3 3 3 0 0 0 0 0 5 5 5 5 5]
[3 3 3 3 3 0 0 0 0 0 5 5 5 5 5]
[3 3 3 3 3 0 0 0 0 0 5 5 5 5 5]
[3 3 3 3 3 0 0 0 0 0 5 5 5 5 5]
[2 2 2 2 2 4 4 4 4 4 2 2 2 2 2]
[2 2 2 2 2 4 4 4 4 4 2 2 2 2 2]
[2 2 2 2 2 4 4 4 4 4 2 2 2 2 2]
[2 2 2 2 2 4 4 4 4 4 2 2 2 2 2]
[2 2 2 2 2 4 4 4 4 4 2 2 2 2 2]
];
More generically, given a 2d array and 'n', I want each element in each array to repeat 'n' times, both in rows and in columns. So, 'n' in the example shown above is 5. There are several ways of accomplishing this.
(A) find an appropriate existing routine specifically for rescaling, courtesy Matthew Kenworthy
use PDL::Image2D;
$out = zeroes( ( pdl($in->dims) * $n )->list );
rescale2d($in, $out);
(B) use direct indexing, courtesy Craig DeForest
$coords = ndcoords( long, ( pdl($in->dims) * $n )->list ) / $n;
$out = $in->range($coords);
(C) use the threading engine and dummy dimensions to do the job.
Matthew Kenworthy's solution with dummy dims:
$ou = PDL->new_from_specification( $in->dims, $n, $n );
$ou .= $in;
$out = $ou->reorder(2,0,3,1)->reshape( $in->dim(0)*$n, $in->dim(1)*$n );
Chris Marshall's solution with clump:
use PDL::NiceSlice;
$out = $in(*$n,:,*$n,:)->clump(0,1)->clump(1,2);
A simple Benchmark test with an input piddle of 5000 x 3000 with n = 10 yielded the following results
rescale2d took 0 wallclock secs ( 0.22 usr + 0.11 sys = 0.33 CPU)
direct_indexing took 5 wallclock secs ( 3.11 usr + 1.62 sys = 4.73 CPU)
dummy took 0 wallclock secs ( 0.28 usr + 0.23 sys = 0.51 CPU)
clump took 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
The fastest method, clump(), uses the threading engine, which is quite fast, and produces a full-size output copy only once. The rescale2d and reshape methods both make at least two copies of the data and probably break L2 cache. Direct indexing can be generalized beyond this affine expansion but is very slow -- mostly because of the additional memory overhead (and CPU cache breakage) required to calculate and access the index variable, which is several times larger than the data.