-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chilly-13.tex
7972 lines (7103 loc) · 368 KB
/
Chilly-13.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% !TEX encoding = UTF-8 Unicode
% !TEX TS-program = pdflatex
% !TEX spellcheck = English
{\let~\catcode~13=13\def^^M{^^J}~` 12~`\%12~`~12\xdef\asciiart{
... . .. ..
xH88"`~ .x8X .uef^" @88> x .d88" x .d88" ..
:8888 .f"8888Hf :d88E %8P 5888R 5888R @L
:8888> X8L ^""` `888E . '888R '888R 9888i .dL
X8888 X888h 888E .z8k .@88u 888R 888R `Y888k:*888.
88888 !88888. 888E~?888L ''888E` 888R 888R 888E 888I
88888 %88888 888E 888E 888E 888R 888R 888E 888I
88888 '> `8888> 888E 888E 888E 888R 888R 888E 888I
`8888L % ?888 ! 888E 888E 888E 888R 888R 888E 888I
`8888 `-*"" / 888E 888E 888& .888B . .888B . x888N><888'
"888. :" m888N= 888> R888" ^*888% ^*888% "88" 888
`""***~"` `Y" 888 "" "% "% 88F
J88" 98"
@% ./"
:" ~`
}}\makeatletter
\documentclass[openany]{amsbook}
\numberwithin{equation}{chapter}
\numberwithin{figure}{chapter}
\numberwithin{table}{chapter}
\usepackage{mathtools,amssymb,yhmath}
\allowdisplaybreaks
\usepackage[english]{babel}
\def\U@DEC#1{\bgroup\def\UTFviii@defined##1{\expandafter#1\string##1+}}
\def\U@DEF#1:#2+{\U@LET:#2+{\@nameuse{意#2義}}\@namedef{意#2義}}
\def\U@LET#1:#2+{\egroup\DeclareUnicodeCharacter{\UTFviii@hexnumber
{\decode@UTFviii#2\relax}}}\U@DEC\U@LET令{\U@DEC\U@LET}令定{\U@DEC\U@DEF}
%mathalpha
令Γ\Gamma 令Δ\Delta 令Θ\Theta 令Λ\Lambda 令Ξ\Xi 令Π\Pi 令Σ\Sigma
令Υ\Upsilon 令Φ\Phi 令Ψ\Psi 令Ω\Omega
令α\alpha 令β\beta 令γ\gamma 令δ\delta 令ε\varepsilon 令ζ\zeta
令η\eta 令θ\theta 令ι\iota 令κ\kappa 令λ\lambda 令μ\mu 令ν\nu
令ξ\xi 令π\pi 令ρ\rho 令ς\varsigma 令σ\sigma 令τ\tau 令υ\upsilon
令φ\varphi 令χ\chi 令ψ\psi 令ω\omega 令ϑ\vartheta 令ϕ\phi 令ϖ\varpi
令Ϝ\Digamma 令ϝ\digamma 令ϰ\varkappa 令ϱ\varrho 令ϴ\varTheta 令ϵ\epsilon
% % fonts
令ℂ{\mathbb C} 令𝔻{\mathbb D} 令𝔼{\mathbb E} 令𝔽{\mathbb F} 令𝔾{\mathbb G}
令𝕀{\mathbb I} 令𝕂{\mathbb K} 令ℕ{\mathbb N} 令ℙ{\mathbb P} 令ℝ{\mathbb R}
令𝕌{\mathbb U} 令𝕍{\mathbb V} 令𝕏{\mathbb X} 令ℤ{\mathbb Z}
令𝒞{\mathcal C} 令𝒟{\mathcal D} 令ℰ{\mathcal E} 令𝒢{\mathcal G} 令𝒥{\mathcal J}
令ℳ{\mathcal M} 令𝒩{\mathcal N} 令𝒪{\mathcal O} 令𝒫{\mathcal P} 令𝒮{\mathcal S}
令𝒰{\mathcal U} 令𝒳{\mathcal X} 令𝒴{\mathcal Y}
令𝔈{\mathfrak E}
\DeclareMathAlphabet\mathsi{T1}\sfdefault\mddefault\sldefault
令𝘈{\mathsi A} 令𝘉{\mathsi B} 令𝘊{\mathsi C} 令𝘋{\mathsi D} 令𝘌{\mathsi E}
令𝘍{\mathsi F} 令𝘏{\mathsi H} 令𝘐{\mathsi I} 令𝘑{\mathsi J} 令𝘒{\mathsi K}
令𝘓{\mathsi L} 令𝘔{\mathsi M} 令𝘗{\mathsi P} 令𝘘{\mathsi Q} 令𝘚{\mathsi S}
令𝘛{\mathsi T} 令𝘝{\mathsi V} 令𝘞{\mathsi W} 令𝘟{\mathsi X} 令𝘠{\mathsi Y}
令𝘡{\mathsi Z}
令𝘢{\mathsi a} 令𝘣{\mathsi b} 令𝘤{\mathsi c} 令𝘦{\mathsi e} 令𝘧{\mathsi f}
令𝘨{\mathsi g} 令𝘴{\mathsi s}
%miscellaneous
令ℓ\ell 令∂\partial 令∇\nabla
%mathopen/close
令√\sqrt
\def\bigl@C#1{\bigl#1} \def\bigr@C#1{\bigr#1}
\def\({\bigl@C(} \def\){\bigr@C)} 令({\Bigl(} 令){\Bigr)}
令【{\biggl(} 令】{\biggr)} 令〖{\Biggl(} 令〗{\Biggr)}
令[{\bigl@C[} 令]{\bigr@C]} 令「{\Bigl[} 令」{\Bigr]}
令{{\bigl@C\{} 令}{\bigr@C\}} 令『{\Bigl\{} 令』{\Bigr\}}
令⌈\lceil 令⌉\rceil 令⌊\lfloor 令⌋\rfloor 令⟨\langle 令⟩\rangle
%mathfence
\def\|{\mathrel\Vert} 令‖{\mathrel\Big\Vert} 令|{\mid\nobreak}
令;{\mathrel;\nobreak} 令\{\mathord\setminus} 令、{\setminus\nobreak} 令:{\colon}
%mathaccent
令ˆ\hat 令˜\tilde 令¯\bar 令˘\breve 令˙\dot 令¨\ddot 令°\mathring
令ˇ{^\vee}
%mathop
令∏\prod 令∑\sum 令∫\int 令⋀\bigwedge 令⋁\bigvee 令⋂\bigcap 令⋃\bigcup
令⨁\bigoplus 令⨂\bigotimes
%mathbin
令±\pm 令·\cdot 令×\times 令÷\frac 令†\dagger 令•\bullet 令∘\circ
令∧\wedge 令∨\vee 令∩\cap 令∪\cup 令⊕\oplus 令⊗\otimes 令⋆\star
%mathord
令¬\neg 令…\ldots 令∀\forall 令∁\complement 令∃\exists 令∞\infty
令⊤\top 令⊥\bot 令⋯\cdots 令★\bigstar 令♠\spadesuit
令♡\heartsuit 令♢\diamondsuit 令♣\clubsuit 令♭\flat 令♮\natural
令♯\sharp
%mathrel
令←\gets 令→\to 令↔\leftrightarrow 令↖\nwarrow 令↗\nearrow 令↘\searrow
令↙\swarrow 令↞\twoheadleftarrow 令↠\twoheadrightarrow 令↤\mapsfrom 令↦\mapsto
令↩\hookleftarrow 令↪\hookrightarrow 令↾{\mathbin\upharpoonright}
令∈\in 令∉\notin 令∋\ni 令∼\sim 令≅\cong 令≈\approx 令≔\coloneqq
令≕\eqqcolon 令≠\neq 令≡\equiv 令≤\leqslant 令≥\geqslant 令≪\ll 令≫\gg
令⊆\subseteq 令⋮\vdots 令⋰\adots 令⋱\ddots 令⟂\perp 令⟵\longleftarrow
令⟶\longrightarrow 令⟼\longmapsto
% format misc
定†#1†{\text{#1}} 令…{,\penalty\binoppenalty\dotsc,\penalty\binoppenalty}
令⁰{^0} 令¹{^1} 令²{^2} 令³{^3} 令⁴{^4} 令⁵{^5} 令⁶{^6} 令⁷{^7} 令⁸{^8} 令⁹{^9}
令₀{_0} 令₁{_1} 令₂{_2} 令₃{_3} 令₄{_4} 令₅{_5} 令₆{_6} 令₇{_7} 令₈{_8} 令₉{_9}
% math misc
令㏒\log
令㏑\ln
定W#1{^{(#1)}}
令P{P_\mathrm e}
令R{C-R}
令X{¯X-μ}
令Z{Z_\mathrm{mxd}}
令S{S_\mathrm{max}}
\def\loll{\bsm{1&0\\1&1}}
\def\locl{\bsm{1&0\\c&1}}
\def\Git{G^{-⊤}}
\def\Eo{E_\mathrm 0}
\def\Er{E_\mathrm r}
\def\Ec{\mathsi E_\mathrm c}
\def\Vc{\mathsi V_\mathrm c}
\def\GY{G_†Ye†}
\def\GB{G_†Barg†}
\def\tiltbox#1#2#3{\mkern-#1mu\hbox{\pdfsave
\pdfsetmatrix{1 0 .2 1}\rlap{$#2$}\pdfrestore}\mkern#3mu}
\def\Wt{\tiltbox{2.2}W{21.9}^t}
\def\Xt{\tiltbox0X{18.8}^t}
\def\Yt{\tiltbox{1}Y{16.9}^t}
\def\diff{\mathrm d}
\DeclareMathOperator\hdis{hdis}
\DeclareMathOperator\hwt{hwt}
\DeclareMathOperator\dist{dist}
\DeclareMathOperator\tr{tr}
\DeclareMathOperator\GL{GL}
\DeclareMathOperator\poly{poly}
\DeclareMathOperator*\argmax{arg\,max}
\DeclareMathOperator*\argmin{arg\,min}
\DeclarePairedDelimiter\abs\lvert\rvert
\DeclarePairedDelimiter\norm\lVert\rVert
\def\mat#1{\begin{matrix}#1\end{matrix}}
\def\bma#1{\begin{bmatrix}#1\end{bmatrix}}
\def\bsm#1{[\begin{smallmatrix}#1\end{smallmatrix}]}
\def\cas#1{\begin{cases*}#1\end{cases*}}
% text misc
\def\glet{\global\let}
\let\EA\expandafter
\let\NE\noexpand
\let\AB\allowbreak
% layout misc
定彈#1{\advance#1\glueexpr0ptplus1ptminus1pt}
\usepackage{tikz,tikz-cd,tikz-3dplot}
% PGF
\let\PMP\pgfmathparse
\def\PMR{\pgfmathresult}
\let\PMS\pgfmathsetmacro
\let\PMT\pgfmathtruncatemacro
\let\PMD\pgfmathdeclarefunction
\def\MD@three#1#2#3#4.{\PMP{0x#1#2#3}}\expandafter\MD@three\pdfmdfivesum{\jobname}.
\pgfmathsetseed{\PMR-\day-31*(\month-12*(\year-2000))}
% TikZ
\usetikzlibrary{decorations.pathreplacing,calc,graphs}
\newdimen\tk
\pgfmathsetlength\tk{(2.2*.4+1.6)*2.096774/2}
\tikzset{
every picture/.style={cap=round,join=round},
every pin/.style={anchor=180+\tikz@label@angle,anchor/.code=},
dot/.pic={\fill circle(.6pt);},
pics/y-tick/.style={code={\draw[gray,very thin](0,0)--(-\tk,0)[left]node{#1};}},
pics/x-tick/.style={code={\draw[gray,very thin](0,0)--(0,-\tk)[below]node{#1};}}
}
\PMD{g2o}1{\PMP{1/(5.714-4.714*h2o(#1))}}% denominator = mu+1-mu*h2(#1)
\PMD{h256}1{% := h2(#1/256)
\PMS\comp{256-#1}%
\PMP{(-#1*ln(#1)-\comp*ln(\comp))/177.445678+8}%
}
\PMD{h2o}1{% := h2(sin(#1)^2)
\PMS\sin{sin(#1)}\PMS\cos{cos(#1)}%
\ifdim\sin pt=0pt%
\PMP{0}%
\else\ifdim\cos pt=0pt%
\PMP{0}%
\else%
\PMP{(-\sin*ln(\sin)*\sin-\cos*ln(\cos)*\cos)*2.88539}% 2/ln(2)
\fi\fi%
}
\PMD{h3}1{\PMP{(#1*ln(3*#1)+(1-#1)*ln((1-#1)*1.5))/ln(3)}}
\PMD{h4}1{\PMS\xb{#1*1.58496}\PMP{(\xb*ln(\xb*1.5)+(1-\xb)*ln((1-\xb)*3))/1.09861}}
% x/b = #1/log_32 % ln(3) = 1.09861
% tikz-cd
\tikzcdset{}
\def\cd{\begin{equation*}\catcode`\&=13\cd@aux}
\newcommand\cd@aux[2][]{\begin{tikzcd}[#1]#2\end{tikzcd}\end{equation*}}
\usepackage{pgfplotstable,booktabs,colortbl}
% pgfplots
\pgfplotsset{
compat/show suggested version=false,compat=1.17, % arxiv = 1.17
}
\usepackage{lettrine}
\def\dropcap#1#2 {\lettrine[lraise=.2]#1{#2} }
\def\[#1\]{\begin{equation*}{#1}\end{equation*}}
\usepackage{xurl}
\usepackage[unicode,pdfusetitle,linktocpage]{hyperref}
\hypersetup{colorlinks,linkcolor=red!50!.,citecolor=green!50!.,urlcolor=blue!50!.}
\usepackage{cleveref}
\def\creflastconjunction{, and~}
定名#1:#2~#3?#4 {\crefname{#1}{#2#3}{#2#4}}
名chapter:Chapter~?s 名figure:Figure~?s 名table:Table~?s
定理#1:#2~#3?#4 {\newtheorem{#1}[thmall]{#2#3}名#1:#2~#3?#4 }
\newtheorem{thmall}{ThmAll}[chapter] 理thm:Theorem~?s
理cor:Corollar~y?ies 理lem:Lemma~?s 理pro:Proposition~?s
理con:Conjecture~?s
\theoremstyle{definition} 理dfn:Definition~?s 理exa:Example~?s
\theoremstyle{remark} 理cla:Claim~?s 理rem:Remark~?s
定式#1:#2~#3?#4 {名#1:#2~#3?#4 \creflabelformat{#1}{\textup{(##2##1##3)}}}
式equ:equation~?s 式ine:inequalit~y?ies 式ins:inequalities~?
式for:formula~?s 式app:approximation~?s 式sup:supremum~?s
式seq:sequence~?s
\def\Parse@TL#1:#2?{\label@in@display@optarg[#1]{#1:#2}}
\AtBeginDocument{\def\label@in@display#1{\incr@eqnum\tag{\theequation}\Parse@TL#1?}}
\def\tagby#1{\tag{by \eqref{#1}}}
\def\tagcopy#1{\tag{\eqref{#1}'s copy}}
定行{\catcode13 }行13定讀#1{行13\def^^M{\\}\def\READ@ATTR{行5#1}\@ifnextchar[{選}{參}}
定選[^^M#1^^M]^^M#2^^M^^M{\READ@ATTR[#1]{#2}}定參^^M#1^^M^^M{\READ@ATTR{#1}}行5
讀
\title
Complexity and Second Moment
of
the Mathematical Theory of Communication
讀
\author
Hsin-Po Wang
讀{\def
\@pdfsubject}
94A24; 68P30; math.IT
讀
\subjclass
Primary 94A24; Secondary 68P30
\begin{document}
\message{\asciiart}
\frontmatter
\noindent
\tikz[remember picture,overlay,line width=.1]{
\tikzset{shift={(.5\textwidth,1em+\headsep+\headheight+\topmargin+1in)}}
% origin is top-center
\large
\draw[nodes={below,yshift=1em-.5ex,align=center}]
(0,-2in)node{\noexpand\uppercase\expandafter{\@title}}
(0,-3.5in)node{BY}
(0,-4in)node{\noexpand\uppercase\expandafter{\authors}}
(0,-5.5in)node{
DISSERTATION\\
~\\
Submitted in partial fulfillment of the requirements\\
for the degree of Doctor of Philosophy in Mathematics\\
in the Graduate College of the\\
University of Illinois Urbana-Champaign, 2021
}
(0,-7.5in)node{Urbana, Illinois}
(0,-8in)node[text width=6.5in,align=left]{
Doctoral Committee:\\
~\\
\hskip3.5em Assistant Professor Partha S. Dey, Chair\\
\hskip3.5em Professor Iwan M. Duursma, Director of Research\\
\hskip3.5em Professor József Balogh\\
\hskip3.5em Professor Marius Junge
}
}
\thispagestyle{empty}
\vbadness999\hbadness99\overfullrule1em彈\baselineskip/4彈\lineskip/4彈\parskip/2
\DeclareRobustCommand\gobblefive[5]{}\addtocontents{toc}\gobblefive% no abstract in toc
\chapter*{Abstract}
The performance of an error correcting code is evaluated by
its block error probability, code rate, and encoding and decoding complexity.
The performance of a series of codes is evaluated by,
as the block lengths approach infinity,
whether their block error probabilities decay to zero,
whether their code rates converge to channel capacity,
and whether their growth in complexities stays under control.
Over any discrete memoryless channel, I build codes such that:
for one, their block error probabilities and code rates scale like random codes';
and for two, their encoding and decoding complexities scale like polar codes'.
Quantitatively, for any constants $\pi,\rho>0$ such that $\pi+2\rho<1$,
I construct a series of error correcting codes with
block length $N$ approaching infinity,
block error probability $\exp(-N^\pi)$,
code rate $N^{-\rho}$ less than the channel capacity,
and encoding and decoding complexity $O(N\log N)$ per code block.
Over any discrete memoryless channel, I also build codes such that:
for one, they achieve channel capacity rapidly;
and for two, their encoding and decoding complexities
outperform all known codes over non-BEC channels.
Quantitatively, for any constants $\tau,\rho>0$ such that $2\rho<1$,
I construct a series of error correcting codes with
block length $N$ approaching infinity,
block error probability $\exp(-(\log N)^\tau)$,
code rate $N^{-\rho}$ less than the channel capacity,
and encoding and decoding complexity $O(N\log(\log N))$ per code block.
The two aforementioned results are built upon two pillars---%
a versatile framework that generates codes on the basis of channel polarization,
and a calculus--probability machinery that evaluates the performances of codes.
The framework that generates codes and the machinery that evaluates codes
can be extended to many other scenarios in network information theory.
To name a few:
lossless compression with side information,
lossy compression,
Slepian--Wolf problem,
Wyner--Ziv Problem,
multiple access channel,
wiretap channel of type I,
and broadcast channel.
In each scenario, the adapted notions of block error probability and code rate
approach their limits at the same paces as specified above.
\addtocontents{toc}\gobblefive% no dedication in toc
\chapter*{}
First there is Bo-Le%
\footnote{
The honorific name of Sun Yang, who is a horse tamer
in Spring and Autumn period and renowned as a judge of horses;
also refers to those who recognize (especially hidden) talent.
}
~
Then can horses gallop hundreds of miles
~
Horses capable of galloping far are common
~
But Bo-Les are scarce
~
\hbox{}\hfill---Han Yu, \emph{On Horses}
\let\hold\hfil\let\cold\l@chapter\def\l@chapter{\def\hfil{\dotfill\glet\hfil\hold}\cold}
\tableofcontents
\mainmatter
\chapter{Introduction}
\dropcap
Seventy-three years ago, Claude E. Shannon founded the theory of information
with an article titled \emph{A Mathematical Theory of Communication},
which was later republished under the name
\emph{The Mathematical Theory of Communication} to reflect its omnipotence.
In the eternal work, Shannon explained how to measure the information content
of a random variable $X$ and argued that, in the long term,
the information content carried by $X$ costs $H(X|Y)+ε$ bits to be remembered,
given that we have free access to another random variable $Y$.
Shannon also showed that, if sending $X$ results in the reception of $Y$,
where $X→Y$ is called a communication channel, then the rate at which
information can be transmitted is $I(X;Y)-ε$ bits per usage of channel.
These results are now called [Shannon's] source coding theorem
and noisy-channel coding theorem, respectively.
The famous article left two loopholes.
Loophole one:
Shannon's proof involves the existence of certain mathematical objects
which, in reality, are next to impossible to find constructively.
As a consequence, Shannon's protocol is never utilized beyond academic interest.
Loophole two:
Whereas Shannon's bound on $ε$ is strong enough to conclude that $ε→0$,
which evinces that $H(X|Y)$ and $I(X;Y)$ are the limits we would like to achieve,
we did not know how rapid $ε$ decays to $0$.
That is, Shannon identified the first order limit of coding,
but left the second order limit open.
The current dissertation aims to patch the two said loopholes
and succeeds in improving over existing patches.
I will present an $ε→0$ coding scheme whose complexity is $O(N㏒(㏒N))$,
where $N$ is the block length, whilst the best known result is $O(N㏒N)$.
This in turn patches the first loophole further,
and is referred to as the complexity paradigm of coding.
I will also present an $O(N㏒N)$ coding scheme
whose $ε$ decays to zero at the pace that is provably optimal,
while earlier works only handle the binary case.
This in turn fills the second loophole further,
and is referred to as the second-moment paradigm of coding.
I will then present a joint scheme that achieves
both $O(N㏒(㏒N))$ and the optimal pace of $ε→0$.
My codes are built upon two pillars.
Pillar one:
The overall code can be seen as a modification
of a recently developed code---polar code.
Depending on how we modify polar coding, we can inherit its
$O(N㏒N)$ complexity or reduce it further down to $O(N㏒(㏒ N))$.
Pillar two:
The polarization kernel can be seen as a flag of the legacy codes---random codes.
Since random coding is the only way to achieve the second-moment paradigm,
I incorporate it to boost the performance of
polar coding to the second-moment paradigm.
The two pillars support a coding scheme whose
complexity scales like polar coding but performance scales like random coding.
After mastering the complexity and second-moment paradigms of
the source coding theorem and the noisy-channel coding theorem,
this dissertation moves forward to a network coding scenario
called distributed lossless compression.
I will then adapt the complexity and second-moment paradigms to these scenario.
Using similar techniques, the same result generalizes to
even more coding scenarios such as
multiple access channels,
wiretap channels of type I, and
broadcast channels,
and is left for future research.
\section{Organization of the Dissertation}
The remaining sections of the current chapter map one-to-one to
the remaining chapters of the current dissertation and serve as their summaries.
Among the chapters:
\Cref{cha:origin} was presented in the preprint \cite{ModerateDeviations18}.
\Cref{cha:prune} was presented in the preprints \cite{LoglogTime18}
and \cite{LoglogTime19}, the latter of which was later published in
IEEE Transactions on Information Theory \cite{LoglogTime21}.
\Cref{cha:general} was presented in the preprint \cite{LargeDeviations18}.
\Cref{cha:random} was presented in the preprint \cite{Hypotenuse19}, which was later
published in IEEE Transactions on Information Theory \cite{Hypotenuse21}.
\section{Original Channel Polarization}
In \cref{cha:origin}, we will revisit Arıkan's original proposal
of polar coding that is dedicated to binary-input channels
and uses $\loll$ as the polarization kernel.
This chapter serves three purposes:
One, the construction by $\loll$ is simple yet powerful enough to achieve
capacity (the first order limit), and is of historical significance.
Two, I will provide a complete proof to the strongest version
of the statements available in the literature that unifies
the techniques spanning across several state-of-the-art works.
Three, the main statement and its proof are the starting point of at least
three generalizations, and will be referred to as the prototype every now and then.
In the seminal paper, Arıkan started with a symmetric binary-input
discrete-output memoryless channel $W(y|x)$ and synthesized
its children $WW1(y₁y₂|u₁)$ and $WW2(y₁y₂u₁|u₂)$ via
% Arikan09 (19) (20)
\begin{gather*}
WW1(y₁y₂|u₁)≔∑_{u₂∈𝔽₂}÷12W(y₁|u₁+u₂)W(y₂|u₂), \\
WW2(y₁y₂u₁|u₂)≔÷12W(y₁|u₁+u₂)W(y₂|u₂).
\end{gather*}
Treating $•W1$ and $•W2$ as transformations applied to channels,
Arıkan continued synthesizing, in a recursive manner,
$W$'s grandchildren $(WW1)W1$, $(WW1)W2$, $(WW2)W1$, and $(WW2)W2$,
followed by $W$'s grand-grandchildren $\((WW1)W1\)W1$, $\((WW1)W1\)W2$, etc,
followed by their descendants ad infinitum.
Arıkan observed that a code can be established by
selecting a subset of reliable synthetic channels.
To evaluate the performance of codes established this way,
we proceed to examine the stochastic process $\{𝘞_n\}$ defined by
$𝘞₀≔W$ and $𝘞_{n+1}≔(𝘞_n)W{1† or †2† with equal chance†}$.
The evolution of the synthetic channels can be controlled by
$Z(WW2)=Z(W)²$ and $Z(W)√{2-Z(W)²}≤Z(WW1)≤2Z(W)-Z(W)²$.
From that I will prove
\[𝘗{Z(𝘞_n)<e^{-2^{πn}}}>𝘗\{Z(𝘞_n)→0\}-2^{-ρn},\label{ine:teaser-Z}\]
where $(π,ρ)$ is any pair of constants that
lies in the shaded area in \cref{fig:lol-bdmc}.
This inequality tells us how reliable $𝘞_n$ can be.
Thus we learn how the original polar coding performs.
\section{Asymmetric Channels}
In \cref{cha:dual}, I will bring up a “dual picture” of \cref{cha:origin}.
The dual picture consists of three elements:
One, through examining the behavior of $𝘞_n$ when it becomes noisy,
i.e., when $Z(𝘞_n)≈1$, we have a more complete
understanding of $\{𝘞_n\}$ as a stochastic process.
Two, the proof of said behavior is the mirror image of the one
given in \cref{cha:origin}, reinforcing the duality.
Three, this result is pivotal to source coding for lossy compressions,
to noisy-channel coding over asymmetric channels,
and to a pruning technique that will be covered in upcoming chapters.
While \cref{ine:teaser-Z} addresses the behavior of $𝘞_n$ at the reliable end,
another parameter $T$ and a stochastic process $\{T(𝘞_n)\}$
are defined to examine the behavior of $𝘞_n$ at the noisy end.
It satisfies $T(WW1)=T(W)²$ and $T(WW2)≤2T(W)-T(W)²$,
and can be used to show that
\[𝘗{T(𝘞_n)<e^{-2^{πn}}}>𝘗\{T(𝘞_n)→0\}-2^{-ρn},\label{ine:teaser-T}\]
where $(π,ρ)$ is any pair of constants that
lies in the \emph{same} shaded area in \cref{fig:lol-bdmc}.
Note that $T(𝘞_n)→0$ iff $Z(𝘞_n)→1$, so \cref{ine:teaser-T} is the
mirror image of \cref{ine:teaser-Z}, telling us how noisy $𝘞_n$ can be.
\Cref{ine:teaser-Z} alone implies that polar coding is good for error correction
over symmetric channels and lossless compression (with or without side information).
\Cref{ine:teaser-T} alone implies that polar coding is good for lossy compression.
Together, \cref{ine:teaser-Z,ine:teaser-T} imply that
(a) polar coding is good over asymmetric channels, and
(b) polar coding can be modified to attain an even lower complexity.
More on (b) in \cref{cha:prune}.
The proofs of \cref{ine:teaser-Z,ine:teaser-T} involve monitoring
a “watch list” subset $𝘈_m$ of moderately reliable channels
for $m=√n,2√n…n-√n$ for some large perfect square $n$.
When $𝘞_m∈𝘈_m$ is moderately good and $𝘞_{m+√n}$ becomes extraordinary reliable,
$𝘞_{m+√n}$ is moved from $𝘈_{m+√n}$ to another “trustworthy” subset $𝘌_{m+√n}$.
On the other hand, when $𝘞_{m+√n}$ becomes noisy, it is temporary removed
from $𝘈_{m+√n}$, but has some chance to be added back to $𝘈_{m+l√n}$
(for some $l≥2$) once $𝘞_{m+l√n}$ becomes moderately reliable again.
The rest of the proof is to quantify moderate and extraordinary
reliabilities and the corresponding frequencies.
\section{Pruning Channel Tree}
In \cref{cha:prune}, I will introduce a novel technique called pruning.
Pruning reduces the complexity of encoder and decoder
while retaining in part the performance of polar codes.
And it is reported prior that pruning reduces the complexity by a constant factor.
In this chapter, I will show that pruned polar codes can
achieve capacity with encoding and decoding complexity $O(N㏒(㏒N))$,
transcending the old $O(N㏒N)$ complexity.
We will see in later chapters that pruning is a special case of
dynamic kerneling and can be applied to more general polar codes.
To explain pruning, I will establish the trinitarian correspondence among
the encoder/decoder, the channel tree, and the channel process $\{𝘞_n\}$.
Pruning the channel tree corresponds to trimming the unnecessary part
of the encoder and decoder, which reduces complexity.
Viewed from the a different perspective, pruning the channel tree
corresponds to declaring a stopping time $𝘴$ that is adapted to
$\{𝘞_n\}$ so that the stochastic process $\{𝘞_{n∧𝘴}\}$ stabilizes
whenever channel transformation becomes ineffective.
Now $𝘞_𝘴$ becomes a random variable associated to code performance.
For all intents and purposes, it suffices to declare
the stopping time $𝘴$ properly and prove inequalities of the form
\begin{gather*}
𝘗{Z(𝘞_𝘴)<4^{-n}}>1-H(W)-2^{-ρn}, \\
𝘗{T(𝘞_𝘴)<4^{-n}}>H(W)-2^{-ρn}
\end{gather*}
and another inequality of the form
\[N𝘌[𝘴]≤O(N㏒(㏒N)).\]
The first two inequalities imply that the code achieves capacity;
the third inequality confirms the complexity being $O(N㏒(㏒N))$.
\section{General Alphabet and Kernel}
In \cref{cha:general}, I will investigate thoroughly
two known ways to generalize polar coding.
One is allowing channels with arbitrary finite input alphabet.
This extends polar coding to all channels Shannon had considered in 1948.
The other is utilizing arbitrary matrices as polarizing kernels.
Doing so provably improves the performance in the long run,
and is reportedly improving the performance for moderate block length.
To begin, we will go over four regimes that connect
probability theory, random coding theory, and polar coding theory:
\begin{itemize}
\item Law of large numbers (LLN) and achieving capacity;
this regime concerns whether block error probability $P$
decays to $0$ while code rate $R$ converges to capacity.
\item Large deviation principle (LDP) and error exponent;
this regime concerns how fast $P$ decays to $0$ when an $R$ is fixed.
\item Central limit theorem (CLT) and scaling exponent; this regime
concerns how fast $R$ approaches capacity a when $P$ is fixed.
\item Moderate deviation principle (MDP);
this regime concerns the general trade-off between $P$ and $R$.
\end{itemize}
Next, I will go back to prove results regarding polar coding.
For any matrix $G$ over any finite field $𝔽_q$,
the LDP data of $G$ include coset distances $D_ZWj≔\hdis(r_j,R_j)$,
where $\hdis$ is the Hamming distance, $r_j$ is the $j$th row of $G$,
and $R_j$ is the subspace spanned by the rows below $r_j$.
Coset distances are such that
\[Z(WWj)≈Z(W)^{D_ZWj}.\]
This approximation is used to control small $Z(𝘞_n)$,
which eventually proves a generalization of \cref{ine:teaser-Z}.
For the dual picture, there is a parameter $S$ generalizing $T$ and satisfying
\[S(WWj)≈S(W)^{D_SWj},\]
where $D_SWj≔\hdis(c_j,C_j)$ is the Hamming distance from $c_j$ the $j$th column
of $G^{-1}$ to $C_j$ the subspace spanned by the columns to the left of $c_j$.
This eventually proves a generalization of \cref{ine:teaser-T}.
The CLT data of an $ℓ×ℓ$ matrix $G$ consist of a choice of parameter $H$, a concave
function $h:[0,1]→[0,1]$ such that $h(0)=h(1)=0$, and a number $ϱ$ such that
\[÷1{ℓ}∑_{i=1}^ℓh(H(WWj))≤ℓ^{-ϱ}h(H(W)),\]
where $H$ could be the conditional entropy
or any other handy parameter that maximizes $ϱ$.
The contribution of \cref{cha:general} is a calculus--probability machinery
that predicts the MDP behavior of polar codes given the LDP and CLT data.
The prediction is of the form
\[𝘗{Z(𝘞_n)<e^{-ℓ^{πn}}}>1-H(W)-ℓ^{-ρn},\]
where $(π,ρ)$ lies to the left of the convex envelope of $(0,ϱ)$
and the convex conjugate of $t↦㏒_ℓ\(ℓ^{-1}∑_{j=1}^ℓ(D_ZWj)^t\)$.
This generalizes \cref{ine:teaser-Z}.
Similarly, the generalization of \cref{ine:teaser-T} reads
\[𝘗{S(𝘞_n)<e^{-ℓ^{πn}}}>H(W)-ℓ^{-ρn},\]
where $(π,ρ)∈[0,1]²$ to the left of the convex envelope of $(0,ϱ)$
and the convex conjugate of $t↦㏒_ℓ\(ℓ^{-1}∑_{j=1}^ℓ(D_SWj)^t\)$.
\section{Random dynamic Kerneling}
In \cref{cha:random}, two novel ideas are invoked to help
achieve the optimal MDP behavior with low complexity.
First, the matrix $G$ that induces polarization is not fixed
but varying on a channel-by-channel basis.
Second, since it is difficult to prove that a specific $G$ is good,
a random variable $𝔾$ is to replace $G$ and I will investigate
the typical behavior of $𝔾$ as a polarizing kernel.
The MDP behavior of random coding, which is provably optimal, reads
\[÷{-㏑P}{N(C-R)²}→÷1{2V},\]
where $C$ is channel capacity and $V$ is another
intrinsic parameter called channel dispersion or varentropy.
Our target behavior is less impressive,
yet it is asymptotically optimal in the logarithmic scale:
\[÷{㏑(-㏑P)}{㏑(N(C-R)²)}≈1.\]
Or equivalently, for any $π+2ρ<1$, there are codes
such that $P<\exp(-N^π)$ and $C-R<N^{-ρ}$.
For the typical LDP behavior of $𝔾$, we need to understand,
for each $j$, the typical Hamming distance $D_ZWj$ from
its $j$th row to the subspace spanned by the rows below.
This step is essentially the Gilbert--Varshamov bound with
slight modifications so that it is easier to manipulate in later steps.
For the typical CLT behavior of $𝔾$, I choose the concave function
$h(x)≔\min(x,1-x)^{ℓ/㏑ℓ}$.
Now we need to understand the typical behavior of
\[÷1{ℓ}∑_{i=1}^ℓh(H(WWj)),\]
where $WWj$ is a random variable depending on $𝔾$.
This boils down to showing that the first few $H(WWj)$ are close to $1$,
while the last few $H(WWj)$ are close to $0$.
To show that $H(WWj)≈1$ and to quantify the approximation,
I reduce this to a reliability analysis of noisy-channel coding.
To show that $H(WWj)≈0$ and to quantify the approximation,
I reduce this to a secrecy analysis of wiretap-channel coding.
\section{Joint Pruning and Kerneling}
In \cref{cha:joint}, I will combine the techniques
in \cref{cha:prune,cha:general} to depict a trade-off between
the complexity---ranging from $O(N㏒N)$ to $O(N㏒(㏒N))$---%
and the decay of $P$---ranging from $\exp(-N^π)$ to $\exp(-(㏒N)^τ)$.
The main idea is to apply the stopping time analysis to
any channel process $\{𝘞_n\}$ whose MDP behavior is known.
It could be a process generated by $\locl$,
for which we know that it is guaranteed to have a positive $ϱ$.
It could be a process generated by a large kernel
whose $ϱ$ is bounded by some other method.
It could also be generated by random dynamic kerneling, for which we know $ϱ→1/2$.
The result is that, for any kernel, polar codes have
the same gap to capacity before and after pruning;
and depending on how aggressively one wants to prune, the complexity per bit
is approximately the logarithm of the logarithm of the block error probability.
For example, if the targeted block error probability is $\exp(-N^π)$,
then the predicted complexity is $O(N㏒N)$.
This recovers the result of the previous chapter.
On the other hand, if the targeted block error probability is $\exp(-(㏒N)^τ)$,
then the predicted complexity is $O(N㏒(㏒N))$.
This is, by far, the lowest complexity for
capacity-achieving codes over generic channels.
Plus the gap to capacity decay to $0$ optimally fast.
\section{Distributed Lossless Compression}
In \cref{cha:dislession}, I will extend the theorems established
in the previous chapters to distributed lossless compression problems.
A distributed lossless compression problem
is a network coding problem where there are $m$ sources,
each to be compressed by a compressor that do not talk to each other,
and a decompressor that attempt to reconstruct all sources.
I will go over the two-source case as a warm up,
the three-source case to demonstrate the difficulty,
and finally the $m$-source case for a general result.
I will explain that, modulo the previous chapters,
the main challenge is to reduce a multiple-sender problems
to several one-sender problems.
The reduction consists of two steps.
The first step is to demonstrate that a random source $X$ can be “split”
into two random fragments $X⟨1⟩$ and $X⟨2⟩$ such that there is a bijection
$X↔(X⟨1⟩,X⟨2⟩)$ and hence they carry the same amount of information.
The second step is to show that, by interleaving the fragments of sources in a way
that is related to Gray codes, we can fine-tune the workloads of every sender.
That helps us achieve every possible distribution of workloads.
A key to the second step is degree theory, an algebraic topology machinery
that determines the surjectivity of a continuous map.
The degree theory offers a sufficient condition on whether
a map is onto the dominant face of a contra-polymatroid.
Here, it is the rate region of a distributed lossless compression problem
that is a contra-polymatroid.
Dually, the capacity region of a multiple access channel
is a polymatroid and a similar argument applies.
This fact indicates that, for both distributed lossless compression and
multiple access channels, splitting coupled with polar coding achieves
the optimal block error probability and the optimal gap to boundary
at the cost of $O(N㏒N)$ complexity.
Or, following the complexity paradigm, one prunes the complexity to $O(N㏒(㏒N))$
if a slightly higher block error probability is acceptable.
\chapter{Original Channel Polarization}\label{cha:origin}
\dropcap
Fifteen years ago, Erdal Arıkan developed a technique,
called \emph{channel combining and splitting}, to combine two identical channels
and then split them into two distinct channels \cite{Arikan06}.
At the cost of having to prepare different codes to deal with
distinct channels, the two new channels enjoy better metrics.
More precisely, the average of the cutoff rates rises.
Arıkan then argued that, by recursively synthesizing
the children of the children of $\dotso$ of the channel,
the rise in cutoff rates eventually pushes them towards the channel capacity.
As it turns out, after a sufficient amount of recursion, one does not need
$2^n$ different coding schemes to deal with the $2^n$ descendants of $W$.
This is because most synthetic channels are either
satisfactorily reliable---so we just transmit plain messages through these---%
or desperately noisy---so we just ignore those.
The phenomenon is named \emph{channel polarization}
and the corresponding coding scheme \emph{polar coding}.
Arıkan showed that this original polar coding achieves channel capacity.
That is, if you follow Arıkan's instruction to construct codes, then $P→0$ and
$R→I(W)$ over any symmetric binary-input discrete-output memoryless channels.
This is done via proving, for some functions $θ(n)$ and $γ(n)$,
(a) that
\[𝘗{Z(𝘞_n)<θ(n)}>I(W)-γ(n),\]
(b) that $2^nθ(n)$ is an upper bound on $P$, and
(c) that $γ(n)$ is an upper bound on $I(W)-R$.
In this chapter, I will characterize the pace of achieving capacity.
We will see that
\[𝘗{Z(𝘞_n)<e^{-2^{πn}}}>I(W)-2^{-ρn}\]
if $(π,ρ)∈[0,1]²$ lies to the left of the convex envelope of
$(0,1/4.714)$ and $1-($the binary entropy function$)$.
Prior to my work, the largest region of achievable $(π,ρ)$
is considerably smaller and reaches only $(0,1/5.714)$ \cite{MHU16}.
See \cref{fig:lol-bdmc} for plots.
\begin{figure}
\tikz[scale=12]{
\draw[gray,very thin]
(0,1/4.714)pic{y-tick=$1/4.714$}(0,1/5.714)pic{y-tick=$1/5.714$}
(0,0)pic{y-tick=$0$}pic{x-tick=$0$}(1/2,0)pic{x-tick=$1/2$}
plot[domain=25:55,samples=90]({sin(\x)^2},{1-h2o(\x)});
\draw[fill=gray]
(0,1/4.714)--plot[domain=40.443:45,samples=30]({sin(\x)^2},{1-h2o(\x)})-|cycle;
\draw
plot[domain=0:45,samples=90]({g2o(\x)*sin(\x)^2},{(1-g2o(\x))/4.714});
\draw[->](0,0)--(0,.3)node[left]{$ρ$};
\draw[->](0,0)--(.7,0)node[right]{$π$};
\draw(.4208,.01818)pic{dot}coordinate[pin=45:{$(0.4208,0.0182)$}];
}
\caption{
The achievable region of $(㏒(-㏒₂P),㏒(C-R))$ is shaded.
The curve part is $1$ minus the binary entropy function.
The straight part is the tangent line from $(0,1/4.714)$ to the curve,
the tangent point being $(0.4208,0,0182)$.
The lower left curve is the previous result \cite{MHU16},
which attains $(0,1/5.714)$.
}\label{fig:lol-bdmc}
\end{figure}
\section{Problem Setup and Primary Definitions}
After introducing some definitions,
this section describes the main problem to be solved in this chapter.
First goes the definition of the channels we want to attack.
\begin{dfn}
A \emph{symmetric binary-input discrete-output memoryless channels} (SBDMCs)
is a Markov chain $W:𝔽₂→𝒴$, where
\begin{itemize}
\item $𝔽₂$ is the finite field or order $2$;
\item $𝒴$ is a finite set;
\item $W(y|x)∈[0,1]$, for $x∈𝔽₂$ and $y∈𝒴$, is an array of transition \\
probabilities satisfying $∑_{y∈𝒴}W(y|x)=1$ for both $x∈𝔽₂$; and
\item there exists an involution $σ:𝒴→𝒴$
such that $W(y|0)=W(σ(y)|1)$ for all $y∈𝒴$.
\end{itemize}
\end{dfn}
Denote by $Q$ the uniform distribution on $𝔽₂$;
treat this as the input distribution of the channel $W$.
Denote by $W(x,y)$ the joint probability $Q(x)W(y|x)$.
Denote by $W(x|y)$ the posterior probability;
note that $W(•|•)$ assumes two interpretations,
depending on whether we want to predict $y$ from $x$ or the other way around.
When it is necessary, $W(y)≔W(y|0)+W(y|1)$ denotes the output probability.
Capital variables $X$ and $Y$, usually with indices,
denote the input and output of the channel governed by $Q$ and $W$.
The definitions of some channel parameters follow.
\begin{dfn}
The \emph{conditional entropy} of $W$ is
\[H(W)≔-∑_{x∈𝔽₂}∑_{y∈𝒴}W(x,y)㏒₂W(x|y),\]
which is the amount of noise/equivocation/ambiguity/fuzziness caused by $W$.
\end{dfn}
\begin{dfn}
The \emph{mutual information} of $W$ is
\[I(W)≔H(Q)-H(W)=∑_{x∈𝔽₂}∑_{y∈𝒴}W(x,y)㏒₂÷{W(x|y)}{Q(x)},\]
which is also the channel capacity of $W$.
\end{dfn}
\begin{dfn}\label{dfn:bin-Z}
The \emph{Bhattacharyya parameter} of $W$ is
\[Z(W)≔2∑_{y∈𝒴}√{W(0,y)W(1,y)},\]
which is twice the Bhattacharyya coefficient between
the joint distributions $W(0,•)$ and $W(1,•)$.
\end{dfn}
The overall goal is to construct, for some large $N$, an encoder $ℰ:𝔽₂^{RN}→𝔽₂^N$
and a decoder $𝒟:𝒴^N→𝔽₂^{RN}$ such that the composition
\cd[every arrow/.append style=mapsto]{
U₁^{RN}\rar{ℰ} & X₁^N\rar{W^N} & Y₁^N\rar{𝒟} & ˆU₁^{RN}
}
is the identity map as frequently as possible,
and $R$ as close to the channel capacity $I(W)$ as possible.
\begin{dfn}
Call $N$ the block length.
Call $R$ the code rate.
Denote by $P$, called the block error probability,
the probability that $ˆU₁^{RN}≠U₁^{RN}$.
\end{dfn}
To reach the overall goal of constructing good error correcting codes,
I will introduce the building block of all techniques
we are to utilize--channel transformation.
\section{Channel Transformation and Tree}
This section motivates and defines the channel transformation.
For the precise connection between the transformation and the actual encoder/decoder
design, please refer to Arıkan's original work \cite{Arikan09}.
Let $G∈𝔽₂^{2×2}$ be the matrix
\[\bma{1&0\\1&1}.\]
Let $U₁,U₂∈𝔽₂$ be two uniform random variables.
Let $X₁²∈𝔽₂$ be the vector
\[\bma{X₁&X₂}≔\bma{U₁&U₂}\bma{1&0\\1&1},\]
or $X₁²≔U₁²G$ for short.
Let $Y₁,Y₂∈𝒴$ be the outputs of two i.i.d.\ copies
of $W$ given the inputs $X₁$ and $X₂$, respectively.
Then the combination of the two $W$'s is the channel
with input $U₁²$ and output $Y₁²$.
To split the combination of the channels, consider a two-step guessing job:
\begin{itemize}
\item Guess $U₁$ given $Y₁²$.
\item Guess $U₂$ given $Y₁²$,
assuming that the guess $ˆU₁$ of $U₁$ is correct.
\end{itemize}
Pretend that there is a channel $WW1$ with input $U₁$ and output $Y₁²$;
this channel captures the difficulty of the first step.
Pretend also that there is a channel $WW2$ with input $U₂$ and output $Y₁²U₁$;
this channel captures the difficulty of the second step.
The precise definitions follows.
\begin{dfn}
Define synthetic channels
\begin{gather*}
WW1(y₁²|u₁)≔∑_{u₂∈𝔽₂}÷12W(y₁|u₁+u₂)W(y₂|u₂), \\
WW2(y₁²u₁|u₂)≔÷12W(y₁|u₁+u₂)W(y₂|u₂).
\end{gather*}
\end{dfn}
Clearly $WW1$ and $WW2$ are of binary input and discrete output.
It can be shown that they are symmetric, hence are SBDMCs.
Therefore, the channel transformation $W↦(WW1,WW2)$
maps the set of SBDMCs to the Cartesian square thereof.
Once we accept the idea that the guessing jobs can be modeled as channels,
we can talk about manipulating the channels
as if they were actual objects instead of describing, abstractly,
the change in ways we are guessing the random variables.
Particularly, we can easily imagine that the channel transformation
applies recursively and gives birth to descendants
$WW{j₁}$, $(WW{j₁})W{j₂}$, $((WW{j₁})W{j₂})W{j₃}$, and so on and so forth.
This family tree of synthetic channels
rooted at $W$ is called the \emph{channel tree}.
To construct a code, choose a large integer $n$ and synthesize the depth-$n$
descendants of $W$, which are of the form $\(\dotsb((WW{j₁})W{j₂})\dotsb\)W{j_n}$.
Select a subset of those channels, which is equivalent to
selecting a subset of indices $(j₁,j₂…j_n)∈\{1,2\}^n$.
Call this subset $𝒥$.
Then by transmitting messages through
the synthetic channels in $𝒥$, a code is established.
This code has block length $N=2^n$, code rate $R=\abs{𝒥}/2^n$,
and block error probability upper bounded by
\[P≤∑_{j₁^n∈𝒥}Z\(\(\dotsb((WW{j₁})W{j₂})\dotsb\)W{j_n}\).\label{ine:P<sum}\]
In all papers I have seen, no upper bound on $P$
other than \cref{ine:P<sum} was used.
So we may pretend that the right-hand side of
\cref{ine:P<sum} is the design block error probability of $𝒥$.
To construct good codes, it suffices to collect
in $𝒥$ synthetic channels with small $Z$.
But the more we collect, the higher the sum of $Z$'s.
This induces a trade-off between $P$ and $R$,
which is the subject of the current chapter.
Let $θ$ be the collecting threshold;
that is, $𝒥$ collects synthetic channels whose $Z$ falls below $θ$.
Then $θ$ parametrizes the trade-off in the sense that $P<Nθ$
and $R$ is the density of the synthetic channels whose $Z$ falls below $θ$.
In the next section, I will introduce some stochastic processes
that help us comprehend the trade-off between $R$ and $P$.
\section{Channel and Parameter Processes}
We are to define some stochastic processes whose sample space
is independent of those of the channels and user messages.
To help distinguish the new source of randomness,
I typeset the relevant symbols (such as $𝘗,𝘌$) in sans serif font.
\begin{dfn}
Let $𝘑₁,𝘑₂,\dotsc$ be i.i.d.\ tosses of a fair coin with sides $\{1,2\}$.
That is,
\[𝘑_n≔\cas{
1 & w.p. $1/2$, \\
2 & w.p. $1/2$.
}\]
Let $𝘞₀,𝘞₁,𝘞₂,\dotsc$, or $\{𝘞_n\}$ in short,
be a stochastic process of SBDMCs defined as follows:
\begin{itemize}
\item $𝘞₀≔W$; and
\item $𝘞_{n+1}≔𝘞_nW{𝘑_{n+1}}$.
\end{itemize}
This is called the \emph{channel process}.
\end{dfn}
\begin{dfn}
Let $\{𝘏_n\}$ be the stochastic process obtained by applying $H$ to $\{𝘞_n\}$.
That is, $𝘏_n≔H(𝘞_n)$.
It is called \emph{Arıkan's martingale}.
\end{dfn}
\begin{dfn}
Let $\{𝘡_n\}$ be the stochastic process obtained by applying $Z$ to $\{𝘞_n\}$.
That is, $𝘡_n≔Z(𝘞_n)$.
It is called \emph{Bhattacharyya's supermartingale}.
\end{dfn}
The remainder of this section is devoted to explaining that Arıkan's martingale