More than one join and stdin, part two #403

johnkerl · 2021-01-18T15:36:52Z

I have multiple files with the same structure (== same header), i.e. one of the files looks like:

→ mlr --csv cat input_1.csv
zone,label,mean,stddev
1,Barren,0.985039418507162,0.00327046755267665
2,Permanent Snow and Ice,0.990449367088603,0.00347695390530483
3,Water Bodies,0.989689587426295,0.00283130417558745
9,Urban and Built-up Lands,0.975935137657604,0.00444728462815199
10,Dense Forests,0.982209571011,0.00151525916704626
20,Open Forests,0.982498749692162,0.00156407685057156
25,Forest/Cropland Mosaics,0.983158952435782,0.00119917817740868
30,Natural Herbaceous,0.982886083655933,0.00172176084515656
35,Natural Herbaceous/Croplands Mosaics,0.983636363636364,0.000771389215405308
36,Herbaceous Croplands,0.983256814928586,0.00116699486215588
40,Shrublands,0.977095890410958,0.00439150607284168

I can repeat the above example for up to 3 inputs files:

→ mlr --csv join --ul --ur --lp l --rp r -j zone,label -f input_1.csv then join -j zone,label -f input_2.csv input_3.csv
zone,label,mean,stddev,lmean,lstddev,rmean,rstddev
1,Barren,0.985141452451229,0.00296063409807811,0.985039418507162,0.00327046755267665,0.984987557668154,0.00294390190031405
2,Permanent Snow and Ice,0.990172413793102,0.003303950103698,0.990449367088603,0.00347695390530483,0.989895569620253,0.0036093031143493
3,Water Bodies,0.988460091843363,0.00249696810252014,0.989689587426295,0.00283130417558745,0.988820568927725,0.00259939680954198
9,Urban and Built-up Lands,0.976210534599518,0.00436170313798978,0.975935137657604,0.00444728462815199,0.976246019422661,0.00448461857275749
10,Dense Forests,0.982076308739861,0.00148154340071296,0.982209571011,0.00151525916704626,0.982190048828062,0.00146571376545496
20,Open Forests,0.982497740034809,0.00153034810204273,0.982498749692162,0.00156407685057156,0.982466195761245,0.00150786476199308
25,Forest/Cropland Mosaics,0.983225779156313,0.00117294632959691,0.983158952435782,0.00119917817740868,0.983067448680308,0.00118920277246029
30,Natural Herbaceous,0.982983720528064,0.00166879860717559,0.982886083655933,0.00172176084515656,0.982925841727431,0.00165974257213183
35,Natural Herbaceous/Croplands Mosaics,0.983354838709678,0.00106402725807591,0.983636363636364,0.000771389215405308,0.983575757575758,0.000817620458171498
36,Herbaceous Croplands,0.983352463549144,0.00113906386917183,0.983256814928586,0.00116699486215588,0.983220069731363,0.00116521692353466
40,Shrublands,0.977145547945205,0.00442587899140026,0.977095890410958,0.00439150607284168,0.977166380789022,0.00444267104546084

How about doing this for many more input files?

Originally posted by @NikosAlexandris in #235 (comment)

The text was updated successfully, but these errors were encountered:

NikosAlexandris · 2021-01-18T15:46:11Z

I probably don't understand the underlying programmatic structures and the complexity of it. Else, is it not common to join multiple files that are identically structured and only rename the columns that differ in content?

Imaginary example

mlr --csv join --cp -j zone, label -f input*.csv

( --cp as in count prefix or maybe --pc as in prefix counter)

that will output

zone,label,mean,stddev,mean1,stddev1,mean2,stddev2,mean3,stddev3
1,Barren,0.91,0.01,0.92,0.02,0.93,0.03,0.94,0.04
..

johnkerl · 2021-02-09T03:48:25Z

Hi @NikosAlexandris -- sorry for the long delay.

I can indeed see the value of this!!

The original idea of Miller was that -- for all verbs, not just join -- the input*.csv are one long stream. Another example of this is mlr --json count *.json -- that counts the number of records in the input stream, not counts per file. One could do for file in *.json; do mlr --json count $file; done.

But here even that wouldn't work, since you truly want an n-wise join of n files, which is a great idea -- and not what was implemented. :^/

NikosAlexandris · 2021-08-01T12:53:35Z

Imaginary example
mlr --csv join --cp -j zone, label -f input*.csv
( --cp as in count prefix or maybe --pc as in prefix counter)

Should be --cs or --sc, s as in suffix as the counter is logical to go in the end :-)

johnkerl · 2021-12-31T15:54:09Z

Closing this as a duplicate of #711 (which remains open).

johnkerl mentioned this issue Jan 18, 2021

More than one join and stdin #235

Closed

johnkerl added the wishlist label Feb 9, 2021

johnkerl closed this as completed Dec 31, 2021

johnkerl added the duplicate label Jan 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More than one join and stdin, part two #403

More than one join and stdin, part two #403

johnkerl commented Jan 18, 2021

NikosAlexandris commented Jan 18, 2021 •

edited

Loading

johnkerl commented Feb 9, 2021

NikosAlexandris commented Aug 1, 2021

johnkerl commented Dec 31, 2021

More than one join and stdin, part two #403

More than one join and stdin, part two #403

Comments

johnkerl commented Jan 18, 2021

NikosAlexandris commented Jan 18, 2021 • edited Loading

johnkerl commented Feb 9, 2021

NikosAlexandris commented Aug 1, 2021

johnkerl commented Dec 31, 2021

NikosAlexandris commented Jan 18, 2021 •

edited

Loading