-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
3847 lines (1834 loc) · 757 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>AWS - Redshift</title>
<link href="2020/12/31/markdown/AWS/AWS2021/redshift/"/>
<url>2020/12/31/markdown/AWS/AWS2021/redshift/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc</a></p><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h2><ul><li>massively parallel, share Nothing Columnar architecture</li></ul><h2 id="best-practices-encoding-compression"><a class="markdownIt-Anchor" href="#best-practices-encoding-compression"></a> Best Practices: Encoding & Compression</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=657" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=657</a></p><ul><li>Use AZt4</li></ul><h3 id="basics"><a class="markdownIt-Anchor" href="#basics"></a> Basics</h3><ul><li>blocks (1MB immutable block encoded with 1 encoding)</li><li>zone maps</li><li>sort key</li></ul><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=787" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=787</a></p><h2 id="best-practices-sort-keys"><a class="markdownIt-Anchor" href="#best-practices-sort-keys"></a> Best Practices: Sort Keys</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=941" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=941</a></p><ul><li>Compound key: Lowest cardinality columns first</li><li>Use script to help find the sort key</li><li>Define sort key on large table, four or less columns</li></ul><h2 id="best-practice-materialize-columns"><a class="markdownIt-Anchor" href="#best-practice-materialize-columns"></a> Best Practice: Materialize columns</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1001" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1001</a></p><ul><li>Frequently filtered and unchanging dimension values should be materialized within fact tables;</li></ul><h2 id="basics-slice-data-distribution"><a class="markdownIt-Anchor" href="#basics-slice-data-distribution"></a> Basics: Slice, Data Distribution</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1114" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1114</a></p><h2 id="best-practices-table-design-summary"><a class="markdownIt-Anchor" href="#best-practices-table-design-summary"></a> Best practices: table design summary</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1455" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1455</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Redshift </tag>
</tags>
</entry>
<entry>
<title>AWS - Reference Case, API First</title>
<link href="2020/12/29/markdown/AWS/AWS2021/DataAnalytics_Airflow/"/>
<url>2020/12/29/markdown/AWS/AWS2021/DataAnalytics_Airflow/</url>
<content type="html"><![CDATA[<p><a href="https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/</a></p><p><a href="https://aws.amazon.com/blogs/containers/how-affirm-uses-aws-fargate-and-apache-airflow-to-manage-batch-jobs/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/containers/how-affirm-uses-aws-fargate-and-apache-airflow-to-manage-batch-jobs/</a></p>]]></content>
<tags>
<tag> Airflow </tag>
</tags>
</entry>
<entry>
<title>AWS - BlogList</title>
<link href="2020/12/29/markdown/AWS/AWS2021/awsblog-index/"/>
<url>2020/12/29/markdown/AWS/AWS2021/awsblog-index/</url>
<content type="html"><![CDATA[<blockquote></blockquote><p><a href="https://aws.amazon.com/blogs/big-data/accessing-and-visualizing-external-tables-in-an-apache-hive-metastore-with-amazon-athena-and-amazon-quicksight/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/accessing-and-visualizing-external-tables-in-an-apache-hive-metastore-with-amazon-athena-and-amazon-quicksight/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/setting-up-automated-data-quality-workflows-and-alerts-using-aws-glue-databrew-and-aws-lambda/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/setting-up-automated-data-quality-workflows-and-alerts-using-aws-glue-databrew-and-aws-lambda/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-by-running-amazon-emr-notebooks-programmatically/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-by-running-amazon-emr-notebooks-programmatically/</a></p>]]></content>
<tags>
<tag> AWS Blog </tag>
</tags>
</entry>
<entry>
<title>AWS - HPC</title>
<link href="2020/08/05/markdown/AWS/AWS2020/Solution_HPC/"/>
<url>2020/08/05/markdown/AWS/AWS2020/Solution_HPC/</url>
<content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><ul><li>AWS re:Invent 2019: [REPEAT 1] HPC on AWS: Innovating without infrastructure constraints (CMP204-R1)</li></ul><blockquote><p><a href="https://youtu.be/g70bvcGlPY4" target="_blank" rel="noopener">https://youtu.be/g70bvcGlPY4</a></p></blockquote><ul><li>AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cloud( CMP318 )</li></ul><blockquote><p><a href="https://youtu.be/x7M3m1jZ7L8" target="_blank" rel="noopener">https://youtu.be/x7M3m1jZ7L8</a></p></blockquote><ul><li><a href="https://youtu.be/0bGZdqx6w1Q" target="_blank" rel="noopener">https://youtu.be/0bGZdqx6w1Q</a></li><li><a href="https://youtu.be/tHylCR0NIwU" target="_blank" rel="noopener">https://youtu.be/tHylCR0NIwU</a></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> HPC </tag>
</tags>
</entry>
<entry>
<title>AWS - KMS</title>
<link href="2020/08/03/markdown/AWS/AWS2020/Security_KMS/"/>
<url>2020/08/03/markdown/AWS/AWS2020/Security_KMS/</url>
<content type="html"><![CDATA[<ul><li>If you want to use AWS managed keys, then you can’t control key rotation, it would be every 3 years.</li><li>If you want to use Customer Managed Keys (CMK), you can turn on automatic rotation for sysmetric keys, it would be every year.</li><li>CMK sysmetric key and asysmetric private key never left KMS unencrypted</li><li>How to choose from Sysmetric and Asysmetric key</li></ul><blockquote><p><a href="https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-choose.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-choose.html</a></p></blockquote>]]></content>
<tags>
<tag> AWS </tag>
<tag> KMS </tag>
</tags>
</entry>
<entry>
<title>LoraWAN</title>
<link href="2020/07/18/markdown/AWS/AWS2020/LoraWAN/"/>
<url>2020/07/18/markdown/AWS/AWS2020/LoraWAN/</url>
<content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><p><a href="https://youtu.be/8Oxcp9wQQnk" target="_blank" rel="noopener">https://youtu.be/8Oxcp9wQQnk</a></p><h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><h2 id="lora-vs-lorawan"><a class="markdownIt-Anchor" href="#lora-vs-lorawan"></a> Lora vs LoraWAN</h2><ul><li>Lora is the protocol, __Lo__ng __Ra__nge ; LoRa is Layer2</li><li>LoraWAN is the IoT solution based on Lora technology</li></ul><h2 id="lora-procons"><a class="markdownIt-Anchor" href="#lora-procons"></a> Lora Pro/Cons</h2><ul><li>ISM Open frequency(415,868,915MHz, free ; no license required</li><li>Interference ; low data rate</li></ul><h2 id="limitations-parameters"><a class="markdownIt-Anchor" href="#limitations-parameters"></a> Limitations / Parameters</h2><p>Target: transmission message about 10 km and the battery last for 2 years.</p><ul><li>Frequency: Pay attention to band requirement per country</li><li>Tx power (transmission power): 2-14 dbm / 5-20 dBm; the higher the power , the longer distance signals can cover</li><li>Bandwidth (125/250/500 KHz): the higher the more data can be include in one transmission; the higher the bandwidth, the shorter battery life, the shorter range and more interference.(??); checked the local laws</li><li>spreading factor: (7-12), the larger spreading factor, the longer distance and shorter battery life.</li><li>coding rate: 4/5, 4/6, 4/7, 4/8,<br>4/5 means 5 error bits used to correct 4 bit of data. The more coding rate, means your data can transfer longer distance but lower battery life.</li></ul><h2 id="lora-device"><a class="markdownIt-Anchor" href="#lora-device"></a> LoRa Device</h2><ul><li><p>Lora Nodes:<br>Normally will integrate sensor, transponder, mircrocontroler all together.<br>Receive and transmit sensor data, send out via air using LoRa protocol</p><ul><li>LoPy ; LORA GPS Hat; RN2483</li></ul></li><li><p>Gateway :<br>Receive LoRa data via multi channels with different frequencies ; send out data to IP network.</p><ul><li>IMST IC880A-SPI (8 channels at a time)</li></ul></li></ul><h2 id="lorawan"><a class="markdownIt-Anchor" href="#lorawan"></a> LoRaWAN</h2><p>Layer 3 and 4 ;</p>]]></content>
<tags>
<tag> IoT </tag>
<tag> LoraWAN </tag>
</tags>
</entry>
<entry>
<title>AWS - Security</title>
<link href="2020/06/16/markdown/AWS/AWS2020/WAR_Security/"/>
<url>2020/06/16/markdown/AWS/AWS2020/WAR_Security/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/u6BCVkXkPnM" target="_blank" rel="noopener">https://youtu.be/u6BCVkXkPnM</a></p><h1 id="aws-reinforce-2019-security-best-practices-the-well-architected-way-sdd318"><a class="markdownIt-Anchor" href="#aws-reinforce-2019-security-best-practices-the-well-architected-way-sdd318"></a> AWS re:Inforce 2019: Security Best Practices the Well-Architected Way (SDD318)</h1><h2 id="incident-response"><a class="markdownIt-Anchor" href="#incident-response"></a> Incident response</h2><p><a href="https://d1.awsstatic.com/whitepapers/aws_security_incident_response.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/aws_security_incident_response.pdf</a></p><p>Playbook vs Runbook: run book have more details</p><p><a href="https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_finding-types-active.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_finding-types-active.html</a></p><ul><li>predefined query against cloudwatch event</li></ul><h2 id="iam"><a class="markdownIt-Anchor" href="#iam"></a> IAM</h2><ul><li><p>SSO<br><a href="https://aws.amazon.com/blogs/security/how-to-establish-federated-access-to-your-aws-resources-by-using-active-directory-user-attributes/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/security/how-to-establish-federated-access-to-your-aws-resources-by-using-active-directory-user-attributes/</a></p></li><li><p>Permission boundaries</p></li><li><p>Automation</p></li><li><p>Role from Account 1 to assume role from Account 2 (hands on)</p></li></ul><h2 id="management"><a class="markdownIt-Anchor" href="#management"></a> Management</h2><p>Detective Control</p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Security </tag>
</tags>
</entry>
<entry>
<title>AWS - Reference Case, API First</title>
<link href="2020/02/23/markdown/AWS/AWS2020/APIFirst/"/>
<url>2020/02/23/markdown/AWS/AWS2020/APIFirst/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/TKgml4bSiZA" target="_blank" rel="noopener">https://youtu.be/TKgml4bSiZA</a></p><h2 id="key-take-away"><a class="markdownIt-Anchor" href="#key-take-away"></a> Key Take Away</h2><ul><li>No IT / Business separation</li><li>Cross functional teams</li><li>Born agile (DevOps)</li><li>TDD , automation and ChatOps</li><li>Customer-centric design</li><li>CD</li></ul><h2 id="archi"><a class="markdownIt-Anchor" href="#archi"></a> Archi</h2><h1 id="reference-openbanking-with-hsbc"><a class="markdownIt-Anchor" href="#reference-openbanking-with-hsbc"></a> Reference Openbanking with HSBC</h1><blockquote></blockquote><p><a href="https://youtu.be/QNM9LVV_eI0" target="_blank" rel="noopener">https://youtu.be/QNM9LVV_eI0</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> API First </tag>
</tags>
</entry>
<entry>
<title>AWS - Protection Ring</title>
<link href="2020/01/24/markdown/BackToBasic/Security/ProtectionRing/"/>
<url>2020/01/24/markdown/BackToBasic/Security/ProtectionRing/</url>
<content type="html"><![CDATA[<p><a href="https://en.wikipedia.org/wiki/Protection_ring" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/Protection_ring</a></p>]]></content>
<tags>
<tag> Security </tag>
<tag> Protection Rings </tag>
</tags>
</entry>
<entry>
<title>AWS - Kinesis</title>
<link href="2020/01/18/markdown/AWS/AWS2021/Kinesis/"/>
<url>2020/01/18/markdown/AWS/AWS2021/Kinesis/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/jKPlGznbfZ0" target="_blank" rel="noopener">https://youtu.be/jKPlGznbfZ0</a></p><h2 id="why-streaming"><a class="markdownIt-Anchor" href="#why-streaming"></a> Why Streaming</h2><ul><li>Data loses value quickly over time<ul><li>“Time critical decisions” need streaming data</li><li>inject as it’s generated, process on the fly and do real-time analytics/ML/Alert/Action</li></ul></li><li>Common streaming use case<ul><li>Smart home / automation / log / Data Lake / IoT</li></ul></li><li>Real time analytics demo (User Dashboard)</li></ul><h2 id="streams-producers-and-consumers"><a class="markdownIt-Anchor" href="#streams-producers-and-consumers"></a> Streams Producers and Consumers</h2><h3 id="producer-limits"><a class="markdownIt-Anchor" href="#producer-limits"></a> Producer limits</h3><ul><li>bandwidth limitation: 1MB/sec/shard</li><li>if not, aggregate your message, and use throughput limitation: 1k record/sec/shard</li></ul><h3 id="normal-consumer"><a class="markdownIt-Anchor" href="#normal-consumer"></a> Normal consumer</h3><ul><li><p>The slowest consumer will also impact number of shards, you might need increase the shards to allow the slowest consumer can process the message concurrently to pick up all the messages</p></li><li><p>The fastest speed you can get the data is one trasaction per 200ms</p></li><li><p>Multiple consumers share the 5 transaction/sec/shard and 1M data / sec /shard limitations.</p><ul><li>Multiple consumers will decrease the troughput as well as increase your latency</li></ul></li><li><p>Workaround , use master stream and copied slave stream</p></li></ul><h3 id="enhanced-fan-out"><a class="markdownIt-Anchor" href="#enhanced-fan-out"></a> Enhanced Fan out</h3><ul><li>use http/2 , subscribe, and data is pushed to consumer</li><li>each consumer gets dedicated 2MB/sec/shard ; message latency can be 15ms</li></ul><h2 id="comcast-streaming"><a class="markdownIt-Anchor" href="#comcast-streaming"></a> Comcast streaming</h2><ul><li>As a platform, design topic for teams to share/communication</li><li>Use API gateway to register the stream</li></ul><p><a href="https://comcastsamples.github.io/KinesisShardCalculator/" target="_blank" rel="noopener">https://comcastsamples.github.io/KinesisShardCalculator/</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Kinesis </tag>
</tags>
</entry>
<entry>
<title>AWS - Design MQTT Topics for AWS IoT Core</title>
<link href="2020/01/02/markdown/AWS/AWS2020/BestPracticesMQTT/"/>
<url>2020/01/02/markdown/AWS/AWS2020/BestPracticesMQTT/</url>
<content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><p><a href="https://d1.awsstatic.com/whitepapers/Designing_MQTT_Topics_for_AWS_IoT_Core.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/Designing_MQTT_Topics_for_AWS_IoT_Core.pdf</a></p><h1 id="mqtt-communication-patterns"><a class="markdownIt-Anchor" href="#mqtt-communication-patterns"></a> MQTT Communication Patterns</h1><ul><li>Point to Point<ul><li>different devices subscribe to the topic relevant to itself</li></ul></li><li>Broadcast<ul><li>multiple devices subscribe to same topic</li></ul></li><li>Fan-in<ul><li>multiple devices publish to same topic</li><li>avoid using fan-in to a single end device (?); use fan-in to route a large fleet of messages via IoT Rules Engine.<ul><li>because this routing may hit a non-adjustable limit on a single device MQTT connection (!!!)</li></ul></li></ul></li></ul><h1 id="mqtt-communication-patterns-2"><a class="markdownIt-Anchor" href="#mqtt-communication-patterns-2"></a> MQTT Communication Patterns</h1><ul><li>device to device</li><li>device to cloud</li><li>cloud to device<ul><li>include session information for tracking purpose</li></ul></li><li>device to/from users</li></ul><h1 id="mqtt-design-best-practices"><a class="markdownIt-Anchor" href="#mqtt-design-best-practices"></a> MQTT Design Best Practices</h1><h2 id="general-best-practices"><a class="markdownIt-Anchor" href="#general-best-practices"></a> General Best Practices</h2><ul><li>topic level: lowercase letters, numbers and dashes</li><li>general to specific</li><li>include any relevant routing information in topic</li><li><strong>prefix</strong> to distinguish data and command topics</li><li>document topic structure as operation practices</li><li>use IoT Thing name as MQTT client ID – easy to correlate for logging and policy purpose</li><li>including Thing Name in any MQTT message published by a thing or sending to a specific thing</li><li>review the limitations<ul><li><a href="https://docs.aws.amazon.com/general/latest/gr/iot-core.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/general/latest/gr/iot-core.html</a></li></ul></li><li>include contextual information in payload messages</li><li>avoid fan-in to a single device – do not allow a single device subscribe to a shared topic (!!!)</li><li>never allow device to subscribe to all topics (#); Use single level wildcard (+) for IoT Rules</li></ul><h2 id="best-practices-for-telemetry"><a class="markdownIt-Anchor" href="#best-practices-for-telemetry"></a> Best Practices for Telemetry</h2><ul><li>IoT Basic Ingest for Telemetry<ul><li>topic is designed to help route to different rules in rule engine (no need for device-2-device)</li></ul></li><li>Traditional MQTT topics</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dt/<application>/<context>/<thing-name>/<dt-type></span><br></pre></td></tr></table></figure><p><application>: useful for version switch<br><context>: grouping ; for example device group id<br><dt-type>: subcomponent of device / sensors</dt-type></context></application></p><h2 id="best-practices-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-commands"></a> Best Practices for Commands</h2><ul><li><p>IoT Shadow</p></li><li><p>AWS IoT Shadow is the preferred AWS IoT Service for implementing individual device<br>commands.</p></li><li><p>AWS IoT Device Jobs(?) should be used for fleet-wide operations as it<br>provides extra benefits, such as Amazon CloudWatch metrics for Job tracking, and the<br>ability to track multiple in-transit Jobs for a single device.</p></li><li><p>You can use a combination of<br>the AWS IoT Shadow, AWS IoT Job documents(?), and standard MQTT topics to support<br>your command use cases.</p></li></ul><h2 id="best-practices-for-using-the-aws-iot-shadow"><a class="markdownIt-Anchor" href="#best-practices-for-using-the-aws-iot-shadow"></a> Best Practices for Using the AWS IoT Shadow</h2><ul><li>Don’t share shadow</li><li>Shadow is for infrequent state or command happen in min/hour/day.</li><li>Use shadow for storing status metrics of device</li><li>Use shadow for firmware version (major.minor.patch)</li><li>use clientToken field for tracking purpose</li></ul><h2 id="best-practices-for-using-iot-jobs-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-using-iot-jobs-for-commands"></a> Best Practices for using IoT Jobs for commands</h2><p>IoT Job contains instructions that the thing must run to complete it’s tranction.</p><ul><li>Use thing group with AWS IoT Jobs<ul><li>update all things with certain firmware</li></ul></li><li>Use staged rollout using Device Jobs</li></ul><h2 id="best-practices-for-using-mqtt-topics-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-using-mqtt-topics-for-commands"></a> Best Practices for using MQTT Topics for commands</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cmd/<application>/<context>/<destination-id>/<req-type></span><br><span class="line">cmd/<application>/<context>/<destination-id>/<res-type></span><br></pre></td></tr></table></figure><ul><li>Command Payload Syntax<ul><li>session id</li><li>response-topic</li></ul></li></ul><h1 id="applications-on-aws"><a class="markdownIt-Anchor" href="#applications-on-aws"></a> Applications on AWS</h1>]]></content>
<tags>
<tag> AWS </tag>
<tag> IoT </tag>
<tag> MQTT </tag>
</tags>
</entry>
<entry>
<title>AWS - EC2</title>
<link href="2019/09/17/markdown/AWS/AWS2019/EC2/"/>
<url>2019/09/17/markdown/AWS/AWS2019/EC2/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/cb0KvqGjXRE" target="_blank" rel="noopener">https://youtu.be/cb0KvqGjXRE</a></p></blockquote><h2 id="ec2"><a class="markdownIt-Anchor" href="#ec2"></a> EC2</h2><ul><li>AWS’s vision of EC2 : compute platform for the world and keep innovation</li><li>EC2 new OS: Amazon Linux 2<ul><li>5 years support</li><li>You can use for on-premise</li></ul></li><li>EC2 support Windows<ul><li>Most windows on cloud runs on AWS</li></ul></li><li>BYO License use AWS License Manager</li><li>Specifically optimized for SAP</li></ul><h3 id="deep-dive"><a class="markdownIt-Anchor" href="#deep-dive"></a> Deep dive</h3><ul><li>AWS Nitro System, accelerate the hypervisor layer</li><li>AWS Firecracker, used by Lambda</li></ul><h2 id="serverless"><a class="markdownIt-Anchor" href="#serverless"></a> Serverless</h2><ul><li>Lambda are triggered trillions of times / month</li></ul><h2 id="storage"><a class="markdownIt-Anchor" href="#storage"></a> Storage</h2><ul><li>S3 Intelligent Tiering — Auto category the data</li><li>S3 Glacier Deep Archive — 70% cheaper than Glacier ; New product</li></ul><h2 id="hibernate-on-demand"><a class="markdownIt-Anchor" href="#hibernate-on-demand"></a> Hibernate On-demand</h2><h2 id="predictive-scaling-scale-for-you-to-more-cater-for-your-spike-need"><a class="markdownIt-Anchor" href="#predictive-scaling-scale-for-you-to-more-cater-for-your-spike-need"></a> Predictive Scaling : Scale for you to more cater for your spike need</h2><h2 id="reference-case"><a class="markdownIt-Anchor" href="#reference-case"></a> Reference Case</h2><ul><li>small company can compete with large studios by using AWS (Think Box)</li></ul><h2 id="hybrid"><a class="markdownIt-Anchor" href="#hybrid"></a> Hybrid</h2><ul><li>Outpost<ul><li>The compute capacity will show in your VPC</li><li>VM version and AWS version</li></ul></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> EC2 </tag>
</tags>
</entry>
<entry>
<title>AWS - Handson Best Practice</title>
<link href="2019/09/13/markdown/AWS/AWS2019/Handson_bestPractise/"/>
<url>2019/09/13/markdown/AWS/AWS2019/Handson_bestPractise/</url>
<content type="html"><![CDATA[<h1 id="cloudformation"><a class="markdownIt-Anchor" href="#cloudformation"></a> CloudFormation</h1><ul><li>Define Security Group separated with Server</li><li>Otherwise the Server Stack is not able to be deleted when Security Group is referenced by Other Servers</li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> Best Practice </tag>
</tags>
</entry>
<entry>
<title>AWS - Keynotes</title>
<link href="2019/08/27/markdown/AWS/AWS2019/Keynote/"/>
<url>2019/08/27/markdown/AWS/AWS2019/Keynote/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/femopq3JWJg" target="_blank" rel="noopener">https://youtu.be/femopq3JWJg</a></p></blockquote><h1 id="redesign-of-the-db-architecture"><a class="markdownIt-Anchor" href="#redesign-of-the-db-architecture"></a> Redesign of the DB architecture</h1><h2 id="history-of-aurora"><a class="markdownIt-Anchor" href="#history-of-aurora"></a> History of Aurora</h2><ul><li>Cell based architectures<ul><li>Shared storage</li><li>Easy plus one failure mode</li></ul></li><li>The log is the database<ul><li>Across the AZ and shard, it’s the log that being moved, not the data.</li></ul></li><li>Change happens at storage layer, redesign to make the storage layer database awareness.</li></ul><h2 id="history-of-dynamodb"><a class="markdownIt-Anchor" href="#history-of-dynamodb"></a> History of DynamoDB</h2><ul><li>Analysis show that 70% query to relational DB is just key value.</li></ul><blockquote><p>DYNAMO</p></blockquote><ul><li>Feature<ul><li>automatic re-sharding</li><li>DB migration service (From Oracle to Dynamo)</li></ul></li></ul><h2 id="basic-knowledge-with-aurora-sharding"><a class="markdownIt-Anchor" href="#basic-knowledge-with-aurora-sharding"></a> Basic knowledge with Aurora sharding</h2><ul><li><p>3 quorums across 3 AZ is not enough, 6 quorums across 3 AZ</p><ul><li>V=6 (every data have 6 copies all together) means there would be 6 node for the same data, when writing, it needs at least more than 3 nodes being alive.</li></ul></li><li><p>When a db using sharding, and has v quorums (servers), we can calculate how many write nodes and read nodes we need by applying the rules.</p><ul><li>if V=6 (we have a cluster of 6 servers) ; (V/2)=3, Vw>3, so Vw=4 ; 4 nodes will be write consistent; Vw+Vr>V, 4+?>6, so Vr=3</li></ul><blockquote><p>Vw + Vr > V<br>Vw>V/2</p></blockquote></li></ul><h1 id="data-lake"><a class="markdownIt-Anchor" href="#data-lake"></a> Data Lake</h1><ul><li>S3 manage 60 terabit /sec in one region</li><li>Culture of durability</li><li>11 9s : Time to Fail and Time to Repair</li></ul><h1 id="1-nov-2018-worlds-largest-oracle-dw-to-redshift"><a class="markdownIt-Anchor" href="#1-nov-2018-worlds-largest-oracle-dw-to-redshift"></a> 1 Nov 2018 - World’s largest Oracle DW to Redshift</h1><ul><li>Redshift concurrency scaling<ul><li>consistently fast with thousands of concurrent queries.</li></ul></li></ul><h1 id="demo-fender-music"><a class="markdownIt-Anchor" href="#demo-fender-music"></a> Demo - Fender Music</h1><h1 id="serverless"><a class="markdownIt-Anchor" href="#serverless"></a> Serverless</h1><ul><li>Lambda handles trillions of request per month</li><li>Random spread the work load to multi servers</li><li>Lambda Layers (share binaries between different lambda functions)</li><li>Nested Application with Lambda</li></ul><h1 id="stepfunction"><a class="markdownIt-Anchor" href="#stepfunction"></a> StepFunction</h1><ul><li>Services that can be orchistrated by StepFunctions<ul><li>Batch , ECS, Fargate, Glue, DynamoDB, SNS, SQS, SageMaker</li></ul></li></ul><h1 id="api-gateway"><a class="markdownIt-Anchor" href="#api-gateway"></a> API Gateway</h1><ul><li>Websocket support for API Gateway</li><li>Move things from EC2 to serveless without change the API</li></ul><h2 id="kinesis-and-managed-streaming-for-kafka"><a class="markdownIt-Anchor" href="#kinesis-and-managed-streaming-for-kafka"></a> Kinesis and Managed Streaming for Kafka</h2><ul><li>Video and audio becoming streaming data</li><li>Kinesis Family</li></ul><h1 id="demo-nab"><a class="markdownIt-Anchor" href="#demo-nab"></a> Demo - NAB</h1><ul><li>Culture, you build it, you fix it, — craftsmanship</li><li>35% application in cloud by 2020</li></ul>]]></content>
<tags>
<tag> AWS </tag>
</tags>
</entry>
<entry>
<title>Gazebo</title>
<link href="2019/08/16/markdown/Trending/Robotics/Robotics/"/>
<url>2019/08/16/markdown/Trending/Robotics/Robotics/</url>
<content type="html"><![CDATA[<h1 id="gazebo"><a class="markdownIt-Anchor" href="#gazebo"></a> Gazebo</h1><p><a href="http://gazebosim.org/" target="_blank" rel="noopener">http://gazebosim.org/</a></p>]]></content>
<tags>
<tag> Robotics </tag>
<tag> Gazebo </tag>
</tags>
</entry>
<entry>
<title>AWS - CloudMap</title>
<link href="2019/08/15/markdown/AWS/AWS2019/CloudMap/"/>
<url>2019/08/15/markdown/AWS/AWS2019/CloudMap/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/fMGd9IUaotE" target="_blank" rel="noopener">https://youtu.be/fMGd9IUaotE</a></p><h2 id="service-registers"><a class="markdownIt-Anchor" href="#service-registers"></a> Service registers</h2><ul><li>Zookeeper , Eureka, SmartStack, SkyDns, Doozerd, etcd, etc</li><li>CloudMap : dynamic map of your cloud</li></ul><h1 id="issue-try-to-solve"><a class="markdownIt-Anchor" href="#issue-try-to-solve"></a> Issue try to solve</h1><ul><li><p>Attribute based service discovery under complex service environment</p><ul><li>Multiple Stage</li><li>Multiple Version</li><li>Multiple Status</li></ul></li><li><p>Handle partial failure</p><ul><li>help you provision Route53 to help handle partial failure</li></ul></li></ul><h1 id="integrate-with-existing-aws-service"><a class="markdownIt-Anchor" href="#integrate-with-existing-aws-service"></a> Integrate with existing AWS service</h1><ul><li>Cloudformation</li><li>IAM</li></ul><h2 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">dig +short A backend.cloudmapdemo.com</span><br><span class="line">172.31.1.228</span><br><span class="line">172.31.0.100</span><br></pre></td></tr></table></figure><h2 id="work-with-consul"><a class="markdownIt-Anchor" href="#work-with-consul"></a> Work with Consul</h2><ul><li>AWS Cloud Map and Consul to extend hybrid infra to multi-region</li></ul><blockquote></blockquote><p><a href="https://www.youtube.com/watch?v=fMGd9IUaotE&list=PL72BC_ThTrzW0wfjYWsPIG-sRb920Ubs3&index=24&t=614s" target="_blank" rel="noopener">https://www.youtube.com/watch?v=fMGd9IUaotE&list=PL72BC_ThTrzW0wfjYWsPIG-sRb920Ubs3&index=24&t=614s</a></p><h2 id="feeling"><a class="markdownIt-Anchor" href="#feeling"></a> Feeling</h2><ul><li>An solution based on Route53.</li><li>A service discovery service.</li><li>provide namespace and service name, it will provide a list of service endpoints.</li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> CloudMap </tag>
</tags>
</entry>
<entry>
<title>AWS - DotNet</title>
<link href="2019/08/15/markdown/AWS/AWS2019/DotNet/"/>
<url>2019/08/15/markdown/AWS/AWS2019/DotNet/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/FteCJQcTDc4" target="_blank" rel="noopener">https://youtu.be/FteCJQcTDc4</a></p><h2 id="modern-net-applications-on-aws"><a class="markdownIt-Anchor" href="#modern-net-applications-on-aws"></a> Modern .NET applications on AWS</h2><p>Mosaic image</p><ul><li>Service being used: .net tool, lambda, xray, ecr fargate, dynamodb, cognito, s3, code pipeline, sqs, stepfunction, aws batch, ssm param, cloudformation</li></ul><h3 id="demo-use-visual-studio-to-cicd"><a class="markdownIt-Anchor" href="#demo-use-visual-studio-to-cicd"></a> Demo : use visual studio to CICD</h3><ul><li>AWS Batch<ul><li>Work as queue; ability to use EC2 Spot Instances</li></ul></li><li>Use Visual Studio, you can directly publish the code to generate Docker image and publish to AWS ECR<ul><li>The code logic is to download the pic, and upload to corresponding S3 Raw folder</li></ul></li><li>Use Visual Studio, directly publish Lambda function<ul><li>In lambda , register XRay will enable XRay drill down details of the invoke</li></ul></li><li>Code Pipeline</li><li>Use step function to link all the functions</li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> DotNet </tag>
</tags>
</entry>
<entry>
<title>AWS - MachineLearning</title>
<link href="2019/08/14/markdown/AWS/AWS2019/MachineLearning/"/>
<url>2019/08/14/markdown/AWS/AWS2019/MachineLearning/</url>
<content type="html"><![CDATA[<h1 id="hands-on"><a class="markdownIt-Anchor" href="#hands-on"></a> Hands-on</h1><p><a href="https://s3.amazonaws.com/solutions-reference/predictive-maintenance-using-machine-learning/latest/predictive-maintenance-using-machine-learning.pdf" target="_blank" rel="noopener">https://s3.amazonaws.com/solutions-reference/predictive-maintenance-using-machine-learning/latest/predictive-maintenance-using-machine-learning.pdf</a></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Couldn't call 'describe_notebook_instance' to get the Role ARN of the instance PredictiveMaintenanceNotebookInstance.</span><br></pre></td></tr></table></figure><p>Update the role attached to the sagemaker instance</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ResourceLimitExceeded</span><br></pre></td></tr></table></figure><p>Change to train_instance_type = ‘ml.p2.xlarge’</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/GW0Bktm55nI" target="_blank" rel="noopener">https://youtu.be/GW0Bktm55nI</a></p><h2 id="aws-machine-learning-stack"><a class="markdownIt-Anchor" href="#aws-machine-learning-stack"></a> AWS Machine Learning Stack</h2><ul><li>ML Frameworks & Infrastructures<ul><li>Frameworks<ul><li>Tensorflow ( 85% TensorFlow workloads in cloud runs on AWS)</li><li>Apache Mxnet – Deep learning for Enterprise dev ; liner scalable</li><li>Pytorch – Facebook ; flexible , versatile and portable</li><li>AWS is framework agnostic</li></ul></li></ul></li><li>ML Services<ul><li>SageMaker workflows</li><li>SageMaker Ground Truth</li><li>Use SageMaker to do Re-enforced ML : For example Vehicle routing</li><li>Sagemaker Neo (Opensource)<ul><li>Accelerate the cycle of doing Machine learning</li><li>CICD</li><li>Optimize between different frameworks</li></ul></li></ul></li><li>AI Services<ul><li>Textract</li></ul></li></ul><h2 id="ge-healthcare-demo"><a class="markdownIt-Anchor" href="#ge-healthcare-demo"></a> GE Healthcare Demo</h2><ul><li>Neural Network Compression : reduce layer and retrain the model</li><li>How to archive network compression using AWS service<ul><li>Use SageMaker RL<ul><li>State current network archi</li><li>Action : remove layer or not</li><li>Reward : Accuracy + compression ratio</li></ul></li><li>Result : 40% smaller model and 1%-2% loss of accuracy</li></ul></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> Machine Learning </tag>
</tags>
</entry>
<entry>
<title>AWS - Encryption</title>
<link href="2019/08/14/markdown/AWS/AWS2019/Security/"/>
<url>2019/08/14/markdown/AWS/AWS2019/Security/</url>
<content type="html"><![CDATA[<h1 id="reference-s3-sse-kms"><a class="markdownIt-Anchor" href="#reference-s3-sse-kms"></a> Reference - S3 SSE-KMS</h1><blockquote><p><a href="https://youtu.be/jZYkJf-9yXI" target="_blank" rel="noopener">https://youtu.be/jZYkJf-9yXI</a></p></blockquote><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/Encryption_KMS.PNG?raw=true" alt="Encryption_KMS.PNG"></p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/Encryption_KMS_1.PNG?raw=true" alt="Encryption_KMS_1.PNG"></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Encryption </tag>
</tags>
</entry>
<entry>
<title>Server Hardware</title>
<link href="2019/08/13/markdown/BackToBasic/Hardware/"/>
<url>2019/08/13/markdown/BackToBasic/Hardware/</url>
<content type="html"><![CDATA[<p>HPE Ethernet 10Gb 2-port 562FLR-SFP+ Adapter</p><p>FLR: integrated on motherboard<br>SFP: fiber<br>SPF+ : single port fiber support 10G</p>]]></content>
<tags>
<tag> hardware </tag>
<tag> Server </tag>
</tags>
</entry>
<entry>
<title>AWS - ELB</title>
<link href="2019/08/13/markdown/AWS/AWS2019/ELB/"/>
<url>2019/08/13/markdown/AWS/AWS2019/ELB/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/VIgAT7vjol8" target="_blank" rel="noopener">https://youtu.be/VIgAT7vjol8</a></p></blockquote><h2 id="elastic-load-balancing-deep-dive-and-best-practices-2018"><a class="markdownIt-Anchor" href="#elastic-load-balancing-deep-dive-and-best-practices-2018"></a> Elastic Load Balancing: Deep Dive and Best Practices - 2018</h2><ul><li><p>Layer 4 and Layer 7 Load balancing difference,</p><ul><li>Layer 4 support TCP; Layer 7 only support http and https(will terminate the TLS)</li><li>Layer 7 Connection will be terminated and pooled</li><li>Layer 7 Headers can be modified</li><li>X-Forwarded-For http header will be modified</li></ul></li><li><p>Product mapping</p><ul><li>Application LB is layer 7 LB; Network LB is layer 4 LB</li></ul></li></ul><h2 id="alb"><a class="markdownIt-Anchor" href="#alb"></a> ALB</h2><ul><li>ALB support Path and host based routing (single ELB dispatch all traffic) ; deep integration with EKS – Micro Service Archi</li><li>ALB can do Redirects ; Fix response ; Slow start (configurable like 10 min) ; ALB IPV4 and V6 support;</li><li>ALB update certs<ul><li>IAM to control who have access to update</li><li>Use ACM (AWS Certificate Manager) to directly push and rotate certs with ALB</li></ul></li><li>Integrate with AWS WAF</li><li>Server Name Indication (SNI) : load balancing multiple applications that have muti certs</li><li>Authentication at ALB layer (OIDC, Cognito, SAML)</li><li>Muti-AZ (by default) and no extra bandwidth charge ;</li><li>Absorbs impact of DNS caching (?)</li><li>Health check ; recommend to use http code to check; work with auto scaling</li></ul><h2 id="nlb"><a class="markdownIt-Anchor" href="#nlb"></a> NLB</h2><ul><li>Million Level request / second</li><li>Static IP for each AZ<ul><li>Firewall example: 2 layers of NLB ; fewer static ip simplified the firewall config</li><li>Route 53 will route to multiple static ip addresses in different AZ.</li></ul></li><li>Support Proxy Protocol V2</li><li>Cloudwatch metrics for NLB : it has flow log</li></ul><h2 id="netflix-demo-identity-platform"><a class="markdownIt-Anchor" href="#netflix-demo-identity-platform"></a> Netflix Demo – Identity Platform</h2><ul><li>Workforce Identity-as-a-Service</li><li>Federate All The Things</li><li>Developer Self-Service<ul><li>SSO; SAML , OAuth2</li></ul></li></ul><h3 id="challenging-with-identity-solution"><a class="markdownIt-Anchor" href="#challenging-with-identity-solution"></a> Challenging with Identity Solution</h3><ul><li>Always catch up new language and frameworks</li><li>Open source varying quality</li><li>Developer friction around configuration</li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/ALB_OpenIDSupport.PNG?raw=true" alt="ALB_OpenIDSupport.PNG"></p><ul><li>Spinnaker</li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> ELB </tag>
</tags>
</entry>
<entry>
<title>AWS - DevOps 2019</title>
<link href="2019/08/06/markdown/AWS/AWS2019/DevOps/"/>
<url>2019/08/06/markdown/AWS/AWS2019/DevOps/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p>Empowering DevOps for Secure by Design (see it live)</p><blockquote><p><a href="https://youtu.be/8UG9E5moCdo" target="_blank" rel="noopener">https://youtu.be/8UG9E5moCdo</a></p></blockquote><ul><li>Workloads are provisioned in min, so security also needs to be addressed in min.<ul><li>Automated security provision</li><li>Secure-by-Design</li></ul></li><li>IBM CloudDeployment Services: multi-cloud support</li></ul><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><p>Enterprise DevOps: Patterns of Efficiency</p><blockquote><p><a href="https://youtu.be/qyhuMDozWXk" target="_blank" rel="noopener">https://youtu.be/qyhuMDozWXk</a></p></blockquote><h2 id="devops-vs-itil-devops-vs-cicd-enterprise-devops-vs-devops-for-startups"><a class="markdownIt-Anchor" href="#devops-vs-itil-devops-vs-cicd-enterprise-devops-vs-devops-for-startups"></a> DevOps vs ITIL ; DevOps vs CICD ; Enterprise DevOps vs DevOps for Startups</h2><ul><li>DevOps share core value with ITIL</li><li>Enterprise DevOps<ul><li>Insource value creation</li><li>DevOps legacy apps</li><li>Culture of inclusion</li></ul></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/DevOps_EnterpriseDevOps.PNG?raw=true" alt="DevOps_EnterpriseDevOps"></p><h2 id="enterprise-devops-case-study-nab"><a class="markdownIt-Anchor" href="#enterprise-devops-case-study-nab"></a> Enterprise DevOps Case study: NAB</h2><ul><li>outsource everything result in lost capability of innovation</li><li>Automating for successful DevOps</li></ul><h2 id="enterprise-devops-case-study-vendor"><a class="markdownIt-Anchor" href="#enterprise-devops-case-study-vendor"></a> Enterprise DevOps Case study: Vendor</h2><ul><li>Migrate to Cloud Quickly and Secured</li><li>Security is not roadblock</li><li>Challenges of scale with Security – Automation and tools and SME</li><li>Preventative ; Detective; Remediation. Try to shift Left (earlier)<ul><li>AWS Service Catalog</li></ul></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> DevOps </tag>
</tags>
</entry>
<entry>
<title>AWS - Digital Transformation</title>
<link href="2019/08/05/markdown/AWS/AWS2019/DigitalTransformation/"/>
<url>2019/08/05/markdown/AWS/AWS2019/DigitalTransformation/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/4Gr7hv24jK4" target="_blank" rel="noopener">https://youtu.be/4Gr7hv24jK4</a></p></blockquote><h2 id="culture-skills-organization-finance"><a class="markdownIt-Anchor" href="#culture-skills-organization-finance"></a> Culture, Skills, Organization, Finance</h2><ul><li><strong>Culture</strong><ul><li><strong>If you want to build a ship , don’t drum up the people to gather the wood, divide the work, and give orders. Instead , teach them to yearn for the vast and endless sea</strong></li><li>Use good judgement instead of process (security , flexibility, HA)</li><li>Ahead in the cloud “BEST PRACTICES for navigating the future of enterprise IT”</li><li>a Seat at the Table</li></ul></li><li><strong>Skill</strong><ul><li>Training and compensation</li><li>Recommend book : POWERFUL</li></ul></li><li><strong>Organization</strong><ul><li>Move from projects to product teams<ul><li>CD; DevOps, “run what you wrote”; Reduce tech-debt and lock-in</li><li>The Phoenix Project ; The DevOps Handbook</li></ul></li></ul></li><li><strong>Capex vs Opex</strong><ul><li>CTO and CFO who decide the IT structure?</li><li>With cloud, it’s hard to go Capex (pay as you go)</li></ul></li></ul><h2 id="pathway-to-digital-transformation"><a class="markdownIt-Anchor" href="#pathway-to-digital-transformation"></a> Pathway to digital transformation</h2><ul><li>Time to value: try to do simple things quickly<ul><li>elite companies are 2555* times faster than slow companies</li></ul></li><li>Distributed optimized capacity<ul><li>Scale, HA, cost-optimized; cloud native</li></ul></li><li>Critical workloads data center replacement : Strategic<ul><li>Who runs the “file drill” for IT ?<ul><li><strong>Chaos Engineering</strong> (Book)</li></ul></li></ul></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> Digital Transformation </tag>
</tags>
</entry>
<entry>
<title>AWS - VPC</title>
<link href="2019/08/05/markdown/AWS/AWS2019/VPC/"/>
<url>2019/08/05/markdown/AWS/AWS2019/VPC/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/ar6sLmJ45xs" target="_blank" rel="noopener">https://youtu.be/ar6sLmJ45xs</a></p></blockquote><ul><li>North-South :</li><li>West-East :</li></ul><h2 id="challenge-with-current-vpc-architecture"><a class="markdownIt-Anchor" href="#challenge-with-current-vpc-architecture"></a> Challenge with current VPC architecture</h2><ul><li>lots of VPC and lots of connections and lots of peering<ul><li>VPC peering : can’t transit</li><li>Transit VPC (VPC with 10.1.0.0/16 and 10.2.0.0/16 go through transit VPC of 10.0.0.0/16)</li><li>Transit Gateway (2018)</li></ul></li></ul><h2 id="transit-gateway-2018-tgw"><a class="markdownIt-Anchor" href="#transit-gateway-2018-tgw"></a> Transit Gateway (2018) – tgw</h2><ul><li><p>Centralize VPN and AWS Direct Connect</p></li><li><p>5k VPC across accounts</p></li><li><p>Flexible</p><ul><li>Control segmentation and sharing with routing</li></ul></li><li><p>Compared with transit VPC</p><ul><li>AWS build in service</li></ul></li><li><p>AWS HyperPlane</p><ul><li>Backbone of NLB, NAT Gateway, EFS and now Transit Gateway</li><li>Region wide scope</li></ul></li></ul><h3 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_flat.PNG?raw=true" alt="vpc_flat"></p><ul><li>Flat : Every VPC should talk to each other.</li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_isolated.PNG?raw=true" alt="vpc_isolated"></p><ul><li>VPN: all traffic need go through VPN</li></ul><h3 id="reference-network-architecture"><a class="markdownIt-Anchor" href="#reference-network-architecture"></a> Reference Network Architecture</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_reference_arch.PNG?raw=true" alt="vpc_arch"></p><h3 id="new-feature-vpc-sharing-and-resource-access-manager"><a class="markdownIt-Anchor" href="#new-feature-vpc-sharing-and-resource-access-manager"></a> New Feature: VPC Sharing and Resource Access Manager</h3><ul><li>external account managing public subnet</li><li>internal account managing private subnet</li><li>by sharing vpc across different account, make the resource more flexible and avoid VPC peering in some cases</li></ul><h3 id="segmentation-considerations"><a class="markdownIt-Anchor" href="#segmentation-considerations"></a> Segmentation considerations</h3><ul><li>SG and IAM are effective and proven</li><li>Shared VPCs vs VPC peering : shared VPC can across multi-account</li><li>Separate VPC + Transit Gateway : simplest design without scaling issue (peering , VPC, routes)</li></ul><h3 id="sharing-considerations"><a class="markdownIt-Anchor" href="#sharing-considerations"></a> Sharing considerations</h3><ul><li>VPC peering (max 100 VPCs); support inter-regions</li><li>AWS PrivateLink : Supports overlapping CIDRs (using ELB)</li><li>AWS Transit VPC : Shared seervices as a spoke</li><li>Transit Gateway : most advanced option</li></ul><h3 id="connecting-to-on-premises"><a class="markdownIt-Anchor" href="#connecting-to-on-premises"></a> Connecting to on-premises</h3><ul><li>Virtual Private Gateway VPN</li><li>Direct Connect</li><li>Customer VPN</li><li>Transit Gateway VPN</li></ul><h3 id="43min-an-advanced-use-case"><a class="markdownIt-Anchor" href="#43min-an-advanced-use-case"></a> 43min : an advanced use case (???)</h3><h3 id="reminder"><a class="markdownIt-Anchor" href="#reminder"></a> Reminder</h3><ul><li>existing DMZs moving to cloud might not be a good idea</li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> Transit VPC </tag>
<tag> Transit Gateway </tag>
</tags>
</entry>
<entry>
<title>AWS - CloudFormation 2019</title>
<link href="2019/08/01/markdown/AWS/AWS2019/CloudFormation/"/>
<url>2019/08/01/markdown/AWS/AWS2019/CloudFormation/</url>
<content type="html"><![CDATA[<h1 id="whats-new"><a class="markdownIt-Anchor" href="#whats-new"></a> what’s New</h1><ul><li>more resources including Alexa and custom resource</li></ul><h2 id="managing-enterprise-complexity"><a class="markdownIt-Anchor" href="#managing-enterprise-complexity"></a> Managing enterprise complexity</h2><ul><li>Seamless handling secrets</li><li>StackSet – overide</li></ul><h2 id="improved-handling-of-secrets"><a class="markdownIt-Anchor" href="#improved-handling-of-secrets"></a> Improved handling of secrets</h2><ul><li>Use SSM to handle dynamic parameter</li></ul><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">MasterUsername:</span> <span class="string">''</span><span class="string">{{resolve:</span> <span class="string">secretsmanager:MyRDSSecrets:SecretString:username}}</span></span><br></pre></td></tr></table></figure><ul><li><p>AWS Cloudformation Macros</p><ul><li>Iteration</li><li>Transformation</li></ul></li><li><p>CloudFormation Linter</p><ul><li>Scripted --> Declarative --> DSLs --> Imperative</li></ul></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> CloudFormation </tag>
</tags>
</entry>
<entry>
<title>AWS - CloudFormation 2019</title>
<link href="2019/08/01/markdown/AWS/AWS2019/Whitepapers_BigDataOptions/"/>
<url>2019/08/01/markdown/AWS/AWS2019/Whitepapers_BigDataOptions/</url>
<content type="html"><![CDATA[<blockquote><p><a href="https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf</a></p></blockquote><h1 id="amazon-kinesis"><a class="markdownIt-Anchor" href="#amazon-kinesis"></a> Amazon Kinesis</h1><ul><li>Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data.<ul><li>Capture and store <strong>terabytes</strong> of data per hour from <strong>hundreds of thousands of sources</strong></li><li>Store a cursor in DynamoDB</li></ul></li><li>Amazon Kinesis Video Streams enables you to build custom applications that process or analyze streaming video.</li><li>Amazon Kinesis Data Firehose enables you to deliver real-time streaming data to AWS destinations such as Amazon S3, Amazon Redshift, Amazon Kinesis Analytics, and Amazon Elasticsearch Service.</li><li>Amazon Kinesis Data Analytics enables you to process and analyze streaming data with standard SQL.</li></ul><h1 id="lambda"><a class="markdownIt-Anchor" href="#lambda"></a> Lambda</h1><ul><li>Default limit for concurrency is 1000</li></ul><h2 id="anti-pattern"><a class="markdownIt-Anchor" href="#anti-pattern"></a> Anti-pattern</h2><ul><li>Long running</li><li>Dynamic Websites</li><li>Stateful Applications</li></ul><h1 id="emr"><a class="markdownIt-Anchor" href="#emr"></a> EMR</h1><h2 id="anti-pattern-2"><a class="markdownIt-Anchor" href="#anti-pattern-2"></a> Anti-pattern</h2><ul><li>Small data set, Amazon EMR is built for massive parallel processing;</li><li>ACID transaction requirements</li></ul><h1 id="glue"><a class="markdownIt-Anchor" href="#glue"></a> Glue</h1><h2 id="anti-pattern-3"><a class="markdownIt-Anchor" href="#anti-pattern-3"></a> Anti-Pattern</h2><ul><li>Data Stearming</li><li>Glue is PySpark based</li><li>NoSQL DB not supported</li></ul>]]></content>
<tags>
<tag> Big Data </tag>
<tag> AWS White Paper </tag>
</tags>
</entry>
<entry>
<title>AWS - CLoudwatch 2019</title>
<link href="2019/07/29/markdown/AWS/AWS2019/Cloudwatch/"/>
<url>2019/07/29/markdown/AWS/AWS2019/Cloudwatch/</url>
<content type="html"><![CDATA[<h1 id="some-numbers-about-cloudwatch"><a class="markdownIt-Anchor" href="#some-numbers-about-cloudwatch"></a> Some numbers about cloudwatch</h1><ul><li>as of Oct 2018, 100 petabytes of logs per month</li><li>Cloudwatch Egress<ul><li>S3; lambda; elastisearch; kinesis firehose</li></ul></li></ul><h1 id="cloudwatch-logs-insight"><a class="markdownIt-Anchor" href="#cloudwatch-logs-insight"></a> CLoudwatch Logs Insight</h1><p>Similar feature like ElastiCache. (handson with investigating the traffic security issue)</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/g1wxfYVjCPY" target="_blank" rel="noopener">https://youtu.be/g1wxfYVjCPY</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Cloudwatch </tag>
<tag> Cloudwatch Insights </tag>
</tags>
</entry>
<entry>
<title>AWS - Amazon Lambda</title>
<link href="2019/07/25/markdown/AWS/AWS2019/Lambda/"/>
<url>2019/07/25/markdown/AWS/AWS2019/Lambda/</url>
<content type="html"><![CDATA[<h1 id="a-serverless-journey-aws-lambda-under-the-hood"><a class="markdownIt-Anchor" href="#a-serverless-journey-aws-lambda-under-the-hood"></a> A Serverless Journey: AWS Lambda Under the Hood</h1><h2 id="lambda-load-balancing"><a class="markdownIt-Anchor" href="#lambda-load-balancing"></a> Lambda Load Balancing</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/lambda_components.png?raw=true" alt="lambda_components"></p><ul><li><p><strong>Front End Invoke</strong>: authentication the caller, load configs & env ; confirm concurrency with <strong>Counting Service</strong></p></li><li><p><strong>Counting Service</strong>: Region wide view of concurrency to help set limits (quorum protocol, 2/3 agreement protocol ); <1.5 milliseconds response time</p></li><li><p><strong>Worker Manager</strong> : assume role, track the container lifecyle (running, idle) and maintain the worker pool</p></li><li><p><strong>Worker</strong> : provision sandbox and download customer code and run;<br>* warm sandbox means the sandbox finished previous run<br>* sandbox is equivalent of docker image</p></li><li><p><strong>Placement Service</strong>: provision worker</p></li><li><p>Example,</p><ul><li>Fannie Mae scale to between 20 and 50,000 concurrent executions over minutes.</li></ul></li></ul><h2 id="lambda-handling-failures"><a class="markdownIt-Anchor" href="#lambda-handling-failures"></a> Lambda Handling Failures</h2><ul><li>Multi-AZ</li></ul><h2 id="security-isolation"><a class="markdownIt-Anchor" href="#security-isolation"></a> Security Isolation</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/lambda_layers.png?raw=true" alt="lambda_layers"></p><ul><li>EC2 as worker level</li><li>EC2 Bare Metal as worker level (no hardware share with other account)<ul><li>Firecraker mode</li></ul></li><li>Virtual Devices have very limited access to improve security</li></ul><h2 id="managing-utilization"><a class="markdownIt-Anchor" href="#managing-utilization"></a> Managing Utilization</h2><ul><li>Keep the server busy</li><li>Utilization is handled by AWS<ul><li>Lambda have different algorithm to spread the load (concentrate the load)</li><li>Lambda Pack different/uncorrelated workload into one server to avoid similar workload spike all together.</li></ul></li></ul><h2 id="lambda-benefit"><a class="markdownIt-Anchor" href="#lambda-benefit"></a> Lambda benefit</h2><ul><li>Load Balancing</li><li>Auto Scaling</li><li>Handling Failures</li><li>Security Isolation</li><li>Managing Utilization</li></ul><h2 id="new-features"><a class="markdownIt-Anchor" href="#new-features"></a> new features</h2><ul><li>Change introduced from 2019<ul><li>Lambda connect out via a shared remote NAT to ENI to outside</li></ul></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/QdzV04T_kec" target="_blank" rel="noopener">https://youtu.be/QdzV04T_kec</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Lambda </tag>
<tag> Serverless </tag>
</tags>
</entry>
<entry>
<title>AWS - IoT</title>
<link href="2019/07/23/markdown/AWS/AWS2019/IoT/"/>
<url>2019/07/23/markdown/AWS/AWS2019/IoT/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/LbeWdLaXYDo" target="_blank" rel="noopener">https://youtu.be/LbeWdLaXYDo</a></p></blockquote><h1 id="home-automation-home-security-home-networking"><a class="markdownIt-Anchor" href="#home-automation-home-security-home-networking"></a> Home automation ; Home security ; Home networking</h1><p>FreeRTOS / Greegrass --> IoT Core, management, Analytics / Database, ML --> IoT applications</p><h2 id="demo-from-vestel"><a class="markdownIt-Anchor" href="#demo-from-vestel"></a> DEMO from Vestel</h2><ul><li>VESTEL</li><li>Dedicated IoT group</li><li>Highlight of current archi<ul><li>Use IoT Core</li><li>Use API Gateway to support service for both Alexa and GoogleHome</li><li>Use lambda to run logic against IoT Core and try serveless</li></ul></li></ul><h2 id="simplify-large-number-of-iot-devices"><a class="markdownIt-Anchor" href="#simplify-large-number-of-iot-devices"></a> Simplify large number of IoT devices</h2><ul><li>WPA3 Specification, new device provision protocol</li><li>By using the mobile to scan the barcode to get the public key of the device ; then the router automatically allow the device to connect to internet.</li></ul><h2 id="home-security-monitoring"><a class="markdownIt-Anchor" href="#home-security-monitoring"></a> Home Security & Monitoring</h2><ul><li>Amazon FreeRTOS,</li><li>AWS Greengrass, allows local RTOS communicate each other</li><li>SageMaker: training the model --> export model to S3</li><li>IoT Core, create a rule, subscribe sound from rule and assign to lambda to call the trained model to detect the sound.</li><li>Push the model to greengrass (local) , then device can push the data to local greengrass to run the same the lambda function.</li><li>Greengrass discovery – a green grass device can discover and connect with the greengrass device</li></ul><h2 id="home-networking"><a class="markdownIt-Anchor" href="#home-networking"></a> Home networking</h2><ul><li>Greengrass as a hub</li><li>Use IoT , using Device Defender , to detect unusual publishing</li></ul><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><blockquote><p><a href="https://youtu.be/HEQkVHxu46A" target="_blank" rel="noopener">https://youtu.be/HEQkVHxu46A</a></p></blockquote><h2 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h2><h3 id="edge-endpoint-amazon-freertos"><a class="markdownIt-Anchor" href="#edge-endpoint-amazon-freertos"></a> Edge / Endpoint : Amazon FreeRTOS</h3><ul><li>OTA: over the air</li></ul><h3 id="device-gateway-greengrass-core"><a class="markdownIt-Anchor" href="#device-gateway-greengrass-core"></a> Device Gateway: GreenGrass Core</h3><ul><li>Can be on-promise or in cloud</li><li>Protocol : MQTT, WebSockets, HTTP</li><li>TLS 1.2 Only</li><li>Message Broker</li></ul><h3 id="device-management-iot-device-management"><a class="markdownIt-Anchor" href="#device-management-iot-device-management"></a> Device Management: IoT Device Management</h3><ul><li>Batch Fleet Provision</li><li>Search device</li></ul><h3 id="iot-device-defender"><a class="markdownIt-Anchor" href="#iot-device-defender"></a> IoT Device Defender</h3><ul><li>Audit Device Config /monitor / Identify Anomalies / Alerts/ Patch</li><li>For example, security best practice check (certificate sharing )</li></ul><h3 id="iot-analytics"><a class="markdownIt-Anchor" href="#iot-analytics"></a> IoT Analytics</h3><ul><li>Pipelines --> Analysis / ML</li></ul><h3 id="other-features"><a class="markdownIt-Anchor" href="#other-features"></a> Other features</h3><ul><li>1-Click , provisioned device . like the aws purchase button.</li></ul><h2 id="demo-modjoul"><a class="markdownIt-Anchor" href="#demo-modjoul"></a> Demo – Modjoul</h2><ul><li><p>8 sensors , 50 MB data per person per day</p></li><li><p>2 weeks data storage locally</p></li><li><p>Use IoT Analytics replace EMR</p></li><li><p>COmment, I DON’T LIKE THIS SOLUTION… haha</p></li></ul>]]></content>
<tags>
<tag> AWS </tag>
<tag> IoT </tag>
<tag> WPA3 </tag>
</tags>
</entry>
<entry>
<title>AWS - Polly</title>
<link href="2019/07/23/markdown/AWS/AWS2019/Polly/"/>
<url>2019/07/23/markdown/AWS/AWS2019/Polly/</url>
<content type="html"><![CDATA[<p><a href="https://aws.amazon.com/blogs/machine-learning/build-your-own-text-to-speech-applications-with-amazon-polly/#" target="_blank" rel="noopener">https://aws.amazon.com/blogs/machine-learning/build-your-own-text-to-speech-applications-with-amazon-polly/#</a></p><ul><li>Lambda changed to 3.7<ul><li>2 lines of code need to be updated.</li></ul></li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">print</span> (<span class="string">"Text to Speech function. Post ID in DynamoDB: "</span> + postId)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#In Python 3 it makes a difference whether you open the file in binary or text mode. Just add the b flag to make it binary:</span></span><br><span class="line"><span class="keyword">with</span> open(output, <span class="string">"ab"</span>) <span class="keyword">as</span> file:</span><br><span class="line"> file.write(stream.read())</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> AWS </tag>
<tag> Polly </tag>
</tags>
</entry>
<entry>
<title>AWS - Amazon Route 53</title>
<link href="2019/07/23/markdown/AWS/AWS2019/Route53/"/>
<url>2019/07/23/markdown/AWS/AWS2019/Route53/</url>
<content type="html"><![CDATA[<h1 id="route53-resolver-released-in-201812"><a class="markdownIt-Anchor" href="#route53-resolver-released-in-201812"></a> Route53 Resolver (Released in 2018.12)</h1><ul><li>Issue: in hybrid architecture, VPC can’t access Data Center name and Data center can’t access VPC private DNS name.</li><li>Traditional workaround:<ul><li>spin up EC2 to run bind or unbound as DNS server, used to forward request to plus-2 resolver</li><li>need to consider failover and sometimes a group of DNS server per vpc</li></ul></li><li>This requirement is called Recursive DNS lookup.</li></ul><h2 id="how-route53-resolver-works"><a class="markdownIt-Anchor" href="#how-route53-resolver-works"></a> How Route53 Resolver works</h2><ul><li>only works for single region (can’t span region)</li><li>multiple VPCs under multiple accounts (as long as they are in same region) can share the same Resolver endpoint</li><li>Need to provision ENI for the resolver, for HA and performance, recommend to provision multiple ENIs<ul><li>One ENI serving one direction of querying (for example, from VPC to On-Pre)</li></ul></li><li>When a resolve request received, it will check against all resolve rules, if no matching, treat as local.<ul><li>rules can be shared between accounts (via Resource Access Manager – RAM)</li></ul></li></ul><h2 id="route-53-resolver-demo"><a class="markdownIt-Anchor" href="#route-53-resolver-demo"></a> Route 53 Resolver Demo</h2><ul><li><p>Resolving sequence</p><ul><li>Auto defined Rules: VPC / Private Hosted Zones/ Internet Resolver</li><li>Extra rules<ul><li>tips, have “.” rule work as default query forward rule, anything not fit in auto defined rules will go to <a href="http://ns.mycompany.com" target="_blank" rel="noopener">ns.mycompany.com</a></li><li>tips, <a href="http://ns.mycomany.com" target="_blank" rel="noopener">ns.mycomany.com</a> have a “.” rule to recursive request to internet if no rules matched</li><li>tips, a rule to <strong>forward</strong> any request to <a href="http://acquriedcompany.com" target="_blank" rel="noopener">acquriedcompany.com</a> to <a href="http://ns.acquriedcompany.com" target="_blank" rel="noopener">ns.acquriedcompany.com</a></li></ul></li></ul></li><li><p>API used to create endpoints;</p><ul><li>Endpoint need to have attached security group to allow port 53</li><li>API to create rule</li><li>API to share defined rules</li></ul></li><li><p>Monitoring: Cloudwatch and CloudTrail</p></li></ul><h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><p>Authoritative DNS<br>Recursive DNS</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/D1n5kDTWidQ" target="_blank" rel="noopener">https://youtu.be/D1n5kDTWidQ</a></p></blockquote>]]></content>
<tags>
<tag> AWS </tag>
<tag> Route53 </tag>
<tag> Hybrid Cloud </tag>
</tags>
</entry>
<entry>
<title>AWS - Database family</title>
<link href="2019/07/22/markdown/AWS/AWS2019/BestPractice_selectDataLayer/"/>
<url>2019/07/22/markdown/AWS/AWS2019/BestPractice_selectDataLayer/</url>
<content type="html"><![CDATA[<h1 id="use-the-right-tool-for-the-right-job"><a class="markdownIt-Anchor" href="#use-the-right-tool-for-the-right-job"></a> Use the right Tool for the right job</h1><p>Aurora benefit :</p><ul><li><p>5x throughput vs MySQL and 3x to Postgres</p></li><li><p>Max 15 read replica</p></li><li><p>six copies of data across 3 AZ and continuous backup to S3</p></li><li><p>AWS DMS (Data Migration Service)</p></li></ul><h1 id="new-tools"><a class="markdownIt-Anchor" href="#new-tools"></a> New Tools</h1><blockquote><p>Data tools are not competing each other, they are complementing each other.<br>Pick the use case then apply the corresponding tech</p></blockquote><ul><li>RDB</li><li>Key-value</li><li>Document</li><li>In-memory</li><li>Graph (Nepture)</li><li>Time-Series</li><li>Ledger</li></ul><h2 id="rdb-key-value-graph"><a class="markdownIt-Anchor" href="#rdb-key-value-graph"></a> RDB Key-value Graph</h2><p>RDB: data integrity ; transaction<br>Key-value: partitioned by keys, consistent performance at scale<br>Graph: <strong>Vertices</strong> and Edges</p><h2 id="case-study"><a class="markdownIt-Anchor" href="#case-study"></a> Case Study</h2><ul><li><p>Airbnb</p><ul><li>Dynamo for use search history</li><li>ElastiCache : caching</li><li>RDS : transaction data</li></ul></li><li><p>A book store</p><ul><li>Used DynamoDB (key-value) to put book information</li><li>ElastiSearch — Steam dynamodb change to trigger lambda to put into elastisearch index</li><li>leader board — use elasticache ; (???) sorting</li><li>Recommendation engine – use graph db to record people with book and purchases</li></ul></li></ul><h2 id="ledger-database"><a class="markdownIt-Anchor" href="#ledger-database"></a> Ledger Database</h2><p>Industry: Healthcare, Government, Manufactures, HR&Payroll</p><ul><li>I want the data to be immutable, can be tracked back, can be Cryptographically Verifiable</li><li>Blockchain is hard to maintain</li><li>Amazon QUantum Ledger Database: Immutable, Cryptographically verifiable, High scalable, Easy to use</li></ul><h2 id="time-series-data-aws-timestream"><a class="markdownIt-Anchor" href="#time-series-data-aws-timestream"></a> Time Series Data – AWS Timestream</h2><p>What kind of data is tiem series data,</p><ul><li>weather ; IoT ; DevOps data</li><li>Time-series data will only have x axis as time , y can be changed in-flight and be flexible</li><li>Change to data from hot->warm->cold storage</li><li>millions of inserts (10M/sercond); serverless ; Trillions of daily events</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>Databases on AWS: The Right Tool for the Right Job ( good PRZ)<br><a href="https://youtu.be/-pb-DkD6cWg" target="_blank" rel="noopener">https://youtu.be/-pb-DkD6cWg</a></p></blockquote>]]></content>
<tags>
<tag> AWS </tag>
<tag> Differentiation </tag>
</tags>
</entry>
<entry>
<title>AWS - EFS</title>
<link href="2019/07/22/markdown/AWS/AWS2019/EFS/"/>
<url>2019/07/22/markdown/AWS/AWS2019/EFS/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/4FQvJ2q6_oA" target="_blank" rel="noopener">https://youtu.be/4FQvJ2q6_oA</a></p></blockquote><ul><li>AWS has 3 main adoption patterns, that can be mapped to 3 storage categories<ul><li>Re-Hosting – Block Storage</li><li>Re-Platform – File Storage – EFS</li><li>Re-Architecting – Object Storage</li></ul></li></ul><h2 id="whats-new"><a class="markdownIt-Anchor" href="#whats-new"></a> What’s new</h2><ul><li>EFS only support linux; new FSx for Windows File Server</li><li>FSx for Lustre</li><li>Support Multi-VPC access</li><li>AWS DataSync : initial full copy, and subsequent incremental transfers of changed data to cloud ; Muti thread</li><li>TCO Example, 100G standard storage, 400G Infrequent, around $50/month</li></ul><h2 id="deep-dive"><a class="markdownIt-Anchor" href="#deep-dive"></a> Deep Dive</h2><ul><li>Performance mode<ul><li>General Purpose , focus on low latency (max 7k iops/sec) – Recommend to start with</li><li>Max I/O, focus on I/O (higher latencies)</li></ul></li><li>Throughput mode<ul><li>Busting Throughput – Recommend to start with</li><li>Provisioned throughput (you can decrease every 24 hours)</li></ul></li><li>EFS Infrequent Access (85% cheaper)<ul><li>Auto lifecycle management (any file not being accessed more than 30 days)</li></ul></li></ul><h3 id="security-model"><a class="markdownIt-Anchor" href="#security-model"></a> Security Model</h3><p>Network using ACL; Access using POSIX or IAM; Encrypt ; Compiance with HIPAA etc.</p><h2 id="use-case"><a class="markdownIt-Anchor" href="#use-case"></a> Use case</h2><ul><li>Atlassian - JIRA</li><li>T-Mobile<ul><li>K8S with EFS (Persistent Volumes for 100s of nodes)</li><li>Cache build dependencies with CICD (Maven dependencies as example)</li><li>Centralized Repository</li><li>Tibco EMS HA</li></ul></li></ul><h2 id="best-practices"><a class="markdownIt-Anchor" href="#best-practices"></a> Best Practices</h2><ul><li>Throughput<ul><li>Multi-Threads</li><li>Multi Directories</li><li>Use large IO (aggregate IO)</li></ul></li><li>IOPS<ul><li>Multi Threads</li><li>Multi Directories</li></ul></li><li>Use Cloudwatch to monitor</li></ul><h1 id="choose-the-right-performance-with-file-system-2018"><a class="markdownIt-Anchor" href="#choose-the-right-performance-with-file-system-2018"></a> Choose the Right performance with File System (2018)</h1><ul><li>After 2018, EFS support provision throughput</li><li>Similar with EBS provision but irrelavant with the size of storage, can be modified using CLI<ul><li>Auto-provision the throughput is in consideration but not available yet</li></ul></li><li>Demo<ul><li>using ioping: A tool to monitor I/O latency in real time</li><li>using nload to monitor network status (because EFS is mounted via network)</li><li>multi-thread will increase the throughput</li><li>use aws efs cli to update the throughput limit , then on-flight change happened</li></ul></li><li>EFS mount helper can help you figure out what configuration you need</li><li>EFS cloudwatch ready to use metrics to help you setup and monitor and tune EFS</li></ul><p>Use Scenario: web , CICD , DEV, big data, ML , db backup<br>Compliant: Healthcare , PCI compliant(payment data) ; at-rest and in-transit security both supported ( no extra cost, but will have performance impact); built in support with KMS and CMK.<br>Soft Limit with EFS: 1G/Sec in all region (can increase when request)<br><strong>EFS FileSnc</strong>: new feature used to migrate local data into EFS multi-threading with security</p><p>Security : KMS CMK</p><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/TS1wS_Wb6PA" target="_blank" rel="noopener">https://youtu.be/TS1wS_Wb6PA</a></p><blockquote></blockquote><p><a href="https://github.com/koct9i/ioping" target="_blank" rel="noopener">https://github.com/koct9i/ioping</a></p><blockquote></blockquote><p><a href="https://aws.amazon.com/blogs/aws/efs-file-sync-faster-file-transfer-to-amazon-efs-file-systems/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/efs-file-sync-faster-file-transfer-to-amazon-efs-file-systems/</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> EFS </tag>
</tags>
</entry>
<entry>
<title>AWS - GreenGrass</title>
<link href="2019/07/04/markdown/AWS/AWS2018/GreenGrass/"/>
<url>2019/07/04/markdown/AWS/AWS2018/GreenGrass/</url>
<content type="html"><![CDATA[]]></content>
<tags>
<tag> AWS </tag>
<tag> IoT </tag>
</tags>
</entry>
<entry>
<title>AWS - S3</title>
<link href="2019/06/21/markdown/AWS/AWS2018/TroubleShooting/s3_access/"/>
<url>2019/06/21/markdown/AWS/AWS2018/TroubleShooting/s3_access/</url>
<content type="html"><![CDATA[<h1 id="trouble-shooting-public-object-access-denied"><a class="markdownIt-Anchor" href="#trouble-shooting-public-object-access-denied"></a> Trouble shooting : Public Object Access Denied</h1><ul><li>ACL and Bucket Policy all set Public</li><li>Account and Bucket level allow it to be Public</li><li>Observation: object uploaded from console works, object uploaded from another account failed.</li></ul><p>Add below to specify the public access as well as assign the original bucket user to have full control</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">--acl public-read</span><br></pre></td></tr></table></figure><h1 id="use-aws-js-s3-explorer"><a class="markdownIt-Anchor" href="#use-aws-js-s3-explorer"></a> use aws-js-s3-explorer</h1><p><a href="https://github.com/awslabs/aws-js-s3-explorer" target="_blank" rel="noopener">https://github.com/awslabs/aws-js-s3-explorer</a></p>]]></content>
<tags>
<tag> troubleshooting </tag>
<tag> AWS </tag>
<tag> S3 </tag>
</tags>
</entry>
<entry>
<title>Kimball Dimensional Modeling Techniques Overview</title>
<link href="2019/06/17/markdown/Datawarehouse/02_KimballDimensionalModelingTechniquesOverview/"/>
<url>2019/06/17/markdown/Datawarehouse/02_KimballDimensionalModelingTechniquesOverview/</url>
<content type="html"><![CDATA[<h1 id="fundamental-concepts"><a class="markdownIt-Anchor" href="#fundamental-concepts"></a> Fundamental Concepts</h1><h2 id="gather-business-requirements-and-data-realities"><a class="markdownIt-Anchor" href="#gather-business-requirements-and-data-realities"></a> Gather Business Requirements and Data Realities</h2><p>samples in the book</p><p>Chapter 1 DW/BI and Dimensional Modeling Primer , p 5<br>Chapter 3 Retail Sales , p 70<br>Chapter 11 Telecommunications , p 297<br>Chapter 17 Lifecycle Overview , p 412<br>Chapter 18 Dimensional Modeling Process and Tasks , p 431<br>Chapter 19 ETL Subsystems and Techniques ,p 444</p><h2 id="collaborative-dimensional-modeling-workshops"><a class="markdownIt-Anchor" href="#collaborative-dimensional-modeling-workshops"></a> Collaborative Dimensional Modeling Workshops</h2><p>Dimension models should be designed by folks who fully understand the business and their needs.</p><h2 id="four-step-dimensional-design-process"><a class="markdownIt-Anchor" href="#four-step-dimensional-design-process"></a> Four-Step Dimensional Design process</h2><ul><li>Select the business Process</li><li>Declare the Grain</li><li>Identify the Dimensions</li><li>Identify the facts</li></ul><h2 id="business-processes"><a class="markdownIt-Anchor" href="#business-processes"></a> Business Processes</h2><p>Operational Activities</p><h2 id="grain"><a class="markdownIt-Anchor" href="#grain"></a> Grain</h2><p>The grain establishes exactly what a single fact table row represents.</p><h2 id="dimensions-for-descriptive-context"><a class="markdownIt-Anchor" href="#dimensions-for-descriptive-context"></a> Dimensions for Descriptive Context</h2><h2 id="facts-for-measurements"><a class="markdownIt-Anchor" href="#facts-for-measurements"></a> Facts for Measurements</h2><h2 id="star-schemas-and-olap-cubes"><a class="markdownIt-Anchor" href="#star-schemas-and-olap-cubes"></a> Star Schemas and OLAP Cubes</h2><h2 id="graceful-extensions-to-dimensional-models"><a class="markdownIt-Anchor" href="#graceful-extensions-to-dimensional-models"></a> Graceful Extensions to Dimensional Models</h2><ul><li>Add column to Fact table to describe FACT</li><li>Add column to Fact table to contain foreign key to new dimension table</li><li>Add column to Dimension table to add Attributes</li></ul><h1 id="basic-fact-table-techniques"><a class="markdownIt-Anchor" href="#basic-fact-table-techniques"></a> Basic Fact Table Techniques</h1><h2 id="fact-table-structure"><a class="markdownIt-Anchor" href="#fact-table-structure"></a> Fact Table Structure</h2><p>A fact table contains the numeric measure produced by an operational measurement event in the real world.</p><h2 id="additive-semi-additive-non-additive-facts"><a class="markdownIt-Anchor" href="#additive-semi-additive-non-additive-facts"></a> Additive, Semi-Additive, Non-Additive Facts</h2><p>Balance amounts are common semi-additive facts because they are additive across all dimensions except time.<br>Some measures are completely non-additive, such as ratios.</p><h2 id="nulls-in-fact-tables"><a class="markdownIt-Anchor" href="#nulls-in-fact-tables"></a> Nulls in Fact Tables</h2><p><strong>nulls must be avoided in the fact table’s foreign keys</strong></p><h2 id="conformed-facts"><a class="markdownIt-Anchor" href="#conformed-facts"></a> Conformed facts</h2><p>Same fact across different table, must use same name</p><h2 id="transaction-fact-tables"><a class="markdownIt-Anchor" href="#transaction-fact-tables"></a> Transaction Fact Tables</h2><h2 id="periodic-snapshot-fact-tables"><a class="markdownIt-Anchor" href="#periodic-snapshot-fact-tables"></a> Periodic Snapshot Fact Tables</h2><h2 id="factless-fact-tables"><a class="markdownIt-Anchor" href="#factless-fact-tables"></a> Factless Fact Tables</h2><p>Samples : table containing students attend school or not.</p><h2 id="aggregate-fact-tables"><a class="markdownIt-Anchor" href="#aggregate-fact-tables"></a> Aggregate Fact Tables</h2><p>For accelerate the query performance.</p><h2 id="consolidated-fact-table"><a class="markdownIt-Anchor" href="#consolidated-fact-table"></a> Consolidated Fact Table</h2><p>Sales actual and sales forcast being saved into same table, this design will make it easy to analysis but hard to ETL.</p><h1 id="basic-dimension-table-techniques"><a class="markdownIt-Anchor" href="#basic-dimension-table-techniques"></a> Basic Dimension Table Techniques</h1><h2 id="dimension-surrogate-key"><a class="markdownIt-Anchor" href="#dimension-surrogate-key"></a> Dimension Surrogate Key</h2><ul><li>Structure: wide, flat, denormalized tables with many low-cardinality text Attributes.</li><li>Single primary keys<ul><li>Can’t use operational system’s natural key</li><li>Recommend to use anonymous integer primary key; Date dimension is exempt from this rule.</li></ul></li></ul><h2 id="natural-durable-and-supernatural-key"><a class="markdownIt-Anchor" href="#natural-durable-and-supernatural-key"></a> Natural, Durable and Supernatural key</h2><ul><li><p>Natural key is generated from business System</p></li><li><p>Durable / Supernatural key is generated by DW to indicate although Natural Key changed but it’s the same object. (for example an employ rejoined.)</p></li><li><p>Drilling down: fundamental data analysis method</p></li><li><p>Degenerate Dimensions</p></li></ul><blockquote><p>example : an invoice with multiple items. Items fact table has all the dimensions as foreign key. Then invoice number become a dimension for item fact table ; but the invoice number dimension do not has any attribute with it. So the invoice number dimension table became a Degenerate Dimension. And this kind of dimension would be helpful with transaction and accumulating snapshot fact tables.</p></blockquote><ul><li>Use text words in dimension attribute instead of crypic abbreviations , flags etc</li><li>Why to use Date Dimension instead of using SQL compute: because Date Dimension has more attributes like: week number, holiday , fiscal period etc.<ul><li>DateTime dimension table also need default row as normal dimension table</li></ul></li><li>Role playing dimension: means dimension being defined once but being referenced mulitple times in one fact table and each time has different meaning. For example , Time dimension.</li><li>Junk Dimension: when transaction has loads of dimension that don’t have a lot of value, we can combine some of them as one dimension.</li><li>Snowflaked Dimensions: when you normalized all the dimension table.</li><li>Outtrigger Dimensions: when dimension reference another dimension.<ul><li>for example a dimension refer to date dimension.</li><li>The baseline is dimensions are all supporting fact table. There shouldn’t be a case that fact table need one dimension to get the key of another dimension.</li></ul></li></ul><h1 id="integration-via-conformed-dimensions"><a class="markdownIt-Anchor" href="#integration-via-conformed-dimensions"></a> Integration via Conformed Dimensions</h1><h1 id="dealing-with-slowly-changing-dimension-attributes"><a class="markdownIt-Anchor" href="#dealing-with-slowly-changing-dimension-attributes"></a> Dealing with Slowly Changing Dimension Attributes</h1><h1 id="dealing-with-dimension-hierarchies"><a class="markdownIt-Anchor" href="#dealing-with-dimension-hierarchies"></a> Dealing with Dimension Hierarchies</h1><h2 id="fixed-depth-positional-hierarchies"><a class="markdownIt-Anchor" href="#fixed-depth-positional-hierarchies"></a> Fixed Depth Positional Hierarchies</h2><h2 id="slightly-raggedvariable-depth-hierarchies"><a class="markdownIt-Anchor" href="#slightly-raggedvariable-depth-hierarchies"></a> Slightly Ragged/Variable Depth Hierarchies</h2><h2 id="raggedvariable-depth-hierarchies-with-hierarchy-bridge-tables"><a class="markdownIt-Anchor" href="#raggedvariable-depth-hierarchies-with-hierarchy-bridge-tables"></a> Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables</h2><h2 id="raggedvariable-depth-hierarchies-with-pathstring-attributes"><a class="markdownIt-Anchor" href="#raggedvariable-depth-hierarchies-with-pathstring-attributes"></a> Ragged/Variable Depth Hierarchies with Pathstring Attributes</h2><h1 id="advanced-fact-table-techniques"><a class="markdownIt-Anchor" href="#advanced-fact-table-techniques"></a> Advanced Fact Table Techniques</h1><h1 id="advanced-dimension-techniques"><a class="markdownIt-Anchor" href="#advanced-dimension-techniques"></a> Advanced Dimension Techniques</h1><h1 id="special-purpose-schemas"><a class="markdownIt-Anchor" href="#special-purpose-schemas"></a> Special Purpose Schemas</h1>]]></content>
<tags>
<tag> Datawarehouse </tag>
<tag> Kimball </tag>
</tags>
</entry>
<entry>
<title>Kimball Dimensional Modeling Techniques applied to Inventory Sample`</title>
<link href="2019/06/17/markdown/Datawarehouse/04_Inventory/"/>
<url>2019/06/17/markdown/Datawarehouse/04_Inventory/</url>
<content type="html"><![CDATA[<h1 id="value-chain-introduction"><a class="markdownIt-Anchor" href="#value-chain-introduction"></a> Value Chain Introduction</h1><p>For value chain, here introduces 3 models.</p><h2 id="inventory-periodic-model"><a class="markdownIt-Anchor" href="#inventory-periodic-model"></a> Inventory Periodic Model</h2><ul><li>Scenario: a grocery with 60,000 products * 100 stores, with daily periodic model, there would be 60k*100=6millon records per day.</li><li>Estimation<ul><li>14byte per row * 6million =84mb per day ; 3 years will be 84 * 1095day=91G data</li><li>or 60days of daily and archive old data to weekly snapshot;</li></ul></li><li>Semi-Additive Facts<ul><li>Pay attention to the use of " SQL AVG" when do summarize</li></ul></li><li>Enhanced Inventory Facts<ul><li>Adding more column to fact table including quantity on hand, quantity sold,<ul><li>quantity sold daily / quantity at hand daily = number of turns</li><li>quantity sold whole year / average quantity at hand daily = number of turns for a year</li><li>Estimate number of days’ supply = current quantity at hand / average quantity sold per day</li></ul></li><li>Adding inventory at cost and inventory value at latest selling price</li></ul></li></ul><h2 id="inventory-transactions-model"><a class="markdownIt-Anchor" href="#inventory-transactions-model"></a> Inventory Transactions model</h2><p>P117</p>]]></content>
<tags>
<tag> Datawarehouse </tag>
<tag> Kimball </tag>
</tags>
</entry>
<entry>
<title>Data Wharehousing, Business Intelligence, and Dimensional Modeling Primer</title>
<link href="2019/06/02/markdown/Datawarehouse/Overview/"/>
<url>2019/06/02/markdown/Datawarehouse/Overview/</url>
<content type="html"><![CDATA[<h1 id="key-difference-between-operational-system-and-data-warehouse"><a class="markdownIt-Anchor" href="#key-difference-between-operational-system-and-data-warehouse"></a> Key difference between operational system and Data warehouse</h1><ul><li>一个往里面送数据,一个往外查数据</li><li>一个要求transaction并且保持当前状态准确,业务逻辑严格按照流程来;一个要求大量查询和比对,查询需求不停变化</li></ul><h1 id="goals-of-data-warehousing-and-business-intelligence"><a class="markdownIt-Anchor" href="#goals-of-data-warehousing-and-business-intelligence"></a> Goals of Data Warehousing and Business Intelligence</h1><ul><li>收集的数据不好用</li><li>收集的数据不是查询友好</li><li>业务人员用起来不方便</li><li>数据不一致</li><li>我们想实现fact-based决策</li></ul><p>所以DW需要,</p><ul><li>数据贴近业务人员;好理解</li><li>consistent: 一样的名字必须代表一样的东西</li><li>能够支持需求变化,能够支持变化的时候对用户透明</li><li>数据必须及时,即使需要clean和validate</li><li>数据安全非常重要, DW的信息决定了一个企业“卖什么东西给谁以什么价格”</li><li>DW是一个decision support system</li><li>DW必须得到业务人员的支持和使用才能成功;跟业务系统不一样,DW是optional,不好用就会被废弃</li></ul><h2 id="publishing-metaphor-for-dwbi-managers"><a class="markdownIt-Anchor" href="#publishing-metaphor-for-dwbi-managers"></a> Publishing Metaphor for DW/BI Managers</h2><p>把DW必须成发行杂志。DW需要</p><ul><li>理解读者</li><li>取悦读者</li><li>保证发行</li></ul><p>类似于发行杂志,DW需要选择数据源,保证数据准确,然后以正确的方式展现给读者(用户),定期更新。</p><h1 id="dimensional-modeling-introduction"><a class="markdownIt-Anchor" href="#dimensional-modeling-introduction"></a> Dimensional Modeling Introduction</h1><p>Dimensional modeling实现了两个难点:</p><ul><li>易于被业务user理解</li><li>查询快速</li></ul><p>“We sell products in various markets and measure our performance over time”<br>这句话里面蕴含了3个dimension, “product”,”market“和”time”</p><p>Dimensional model常常使用关系型数据库,但是和3NF(normal form)模型不同。</p><ul><li>3NF的目的是去除redundency, 属于ER (entity relationship)模型; Dimensional模型也属于ER模型</li><li>3NF和Dimential model的关键不同是normalization的程度</li><li>3NF的normalization程度更高,我们一般叫normalized model</li><li>3NF的缺点是复杂以及查询性能不好</li><li>dimensional model易于用户理解;查询性能好,易于根据业务需求变化而变化</li></ul><h2 id="star-schemas-versus-olap-cubes"><a class="markdownIt-Anchor" href="#star-schemas-versus-olap-cubes"></a> Star Schemas Versus OLAP Cubes</h2><ul><li>Dimensional model用关系型数据库实现就是Star Schema</li><li>Dimensional model用多维数据库实现就是OLAP data cube</li></ul><h2 id="olap-deployment-considerations"><a class="markdownIt-Anchor" href="#olap-deployment-considerations"></a> OLAP Deployment Considerations</h2><ul><li>Star Schema是基础</li><li>OLAP的性能优势在被新技术蚕食(例如内存数据库,columnar DB)</li><li>OLAP的表设计常常绑定技术提供商,移植性比较差。</li><li>OLAP的数据安全性比较好;可以做到限制用户只能看到summary</li><li>OLAP的分析能力更强大</li><li>OLAP对变化的dimension支持更好</li><li>OLAP支持snapshot fact但是不支持accumulate</li><li>OLAP对hirarchy等类型的数据查询支持比较好</li></ul><h2 id="fact-tables-for-measurements"><a class="markdownIt-Anchor" href="#fact-tables-for-measurements"></a> Fact Tables for Measurements</h2><ul><li>Each row in a fact table corresponds to a measurement event. 不能拆。</li></ul><blockquote><p>a measurement event in the physical world has a one-to-one<br>relationship to a single row in the corresponding fact table is a bedrock principle<br>for dimensional modeling</p></blockquote><ul><li>Facts are often described as continuously valued to help sort out what is a fact<br>versus a dimension attribute.<ul><li>Additivity fact : 销售额</li><li>Semi-Additivity fact: 例如account balance</li><li>Non-Additivity fact:例如产品单价</li></ul></li><li>textual Fact: 通常没有, 如果有也尽量放到Dimensional里面去</li><li>Empty item. Fact 里面一定要放发生的事件,没有发生不要尝试放0.</li><li>Fact表通常非常sparse; Fact表通常占据90%的存储; Fact表通常row非常大,column比较少;Fact表通常可以通过size预估行数</li><li>Fact表分三种:transaction, periodic snapshot, and accumulating snapshot.</li><li>Fact表至少有两个外键, 用来引用dimension表的主键</li><li><strong>referential integrity</strong> 保证Fact表的条目引用的每个外键都正确</li><li><strong>composite key</strong> Fact表的主键通常由所有的外键组合而成.</li></ul><h2 id="dimension-tables-for-descriptive-context"><a class="markdownIt-Anchor" href="#dimension-tables-for-descriptive-context"></a> Dimension Tables for Descriptive Context</h2><ul><li>Dimension table 用来定义measurable业务事件的textual context</li><li>Dimension 表描述who, what,when,where, how, why</li><li>Dimension表通常列非常多,通常50-100个很正常</li><li>Dimension表通常row少,column多</li><li>Dimension表只有一个主键</li><li>Dimension的attribute是主要的查询,分组以及报告label的来源</li><li>Dimension的attribute名字必须有业务含义</li><li>例如如果一个code有前两个字段代表一个含义,后面两个字段代表一个含义,设计的时候最好单独出来一个dimension而不是让客户查询的时候manipulate字符串</li><li>实际设计的时候,如何确定一个numeric value是fact还是dimensional – 确定它们是不是需要参与计算; 看数字是连续还是离散的</li></ul><h2 id="facts-and-dimensions-joined-in-a-star-schema"><a class="markdownIt-Anchor" href="#facts-and-dimensions-joined-in-a-star-schema"></a> Facts and Dimensions Joined in a Star Schema</h2><p>Benefit of Star Schema</p><ul><li><p>Easy to understand</p></li><li><p>Simplicity brings in performance benefits</p></li><li><p>Dimensional model are gracefuly extensible to accommodate change.</p><ul><li>Fact won’t change, but dimension values can.</li><li>By adding new rows to dimension table or alter current fact table to add new dimension FK will fulfilll the change requirement</li></ul><p>A sample of SQL for star schema</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line">store.district_name,</span><br><span class="line">product.brand,</span><br><span class="line"><span class="keyword">sum</span>(sales_facts.sales_dollars) <span class="keyword">AS</span> <span class="string">"Sales Dollars"</span></span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"><span class="keyword">store</span>,</span><br><span class="line">product,</span><br><span class="line"><span class="built_in">date</span>,</span><br><span class="line">sales_facts</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line">date.month_name=<span class="string">"January"</span> <span class="keyword">AND</span></span><br><span class="line">date.year=<span class="number">2013</span> <span class="keyword">AND</span></span><br><span class="line">store.store_key = sales_facts.store_key <span class="keyword">AND</span></span><br><span class="line">product.product_key = sales_facts.product_key <span class="keyword">AND</span></span><br><span class="line">date.date_key = sales_facts.date_key</span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line">store.district_name,</span><br><span class="line">product.brand</span><br></pre></td></tr></table></figure><p>Where clauses including filter then join between fact and dimention then group by to estabsh the aggregation.</p></li></ul><p>P54.</p><h1 id="kimballs-dwbi-architecture"><a class="markdownIt-Anchor" href="#kimballs-dwbi-architecture"></a> Kimball’s DW/BI Architecture</h1><p>4 components: Operational Source Systems, ETL system, Data PRZ area, BI applications</p><h2 id="operational-source-systems"><a class="markdownIt-Anchor" href="#operational-source-systems"></a> Operational Source Systems</h2><ul><li>Focusing on : Performance and availability</li><li>Maintain little historical data</li></ul><h2 id="extract-transformation-and-load-system"><a class="markdownIt-Anchor" href="#extract-transformation-and-load-system"></a> Extract, Transformation, and Load System</h2><ul><li>Extraction: move the data into DW scope</li><li>Transformation: enrich, de-dup , etc</li><li>Load the data into dimensional model<ul><li>including Surrogate key assignment</li></ul></li></ul><blockquote><p>Industry argument, should ETL landing area be normalized structure? No need.</p></blockquote><h2 id="presentation-area-to-support-business-intelligence"><a class="markdownIt-Anchor" href="#presentation-area-to-support-business-intelligence"></a> Presentation Area to Support Business Intelligence</h2><ul><li>Baseline: data must be dimensional schema or OLAP cubes ; this has been accepted by industry</li><li>presentation area must contain atomic data (vs summary data );<ul><li>it’s <strong>unacceptable</strong> to put atomic data in to 3NF model and only put summary data into star schema (WRONG)</li></ul></li><li>Data area should be around process measurement; and across organizational dep boundaries.</li><li>When the bus architecture is used as a framework, you can develop the enterprise data warehouse in an agile, decentralized, realistically scoped, iterative manner.</li></ul><blockquote><p>Data in the queryable presentation area of the DW/BI system must be dimensional, atomic (complemented by performance-enhancing aggregates), business process-centric, and adhere to the enterprise data warehouse bus architecture.The data must not be structured according to individual departments’ interpretation of the data.</p></blockquote><h2 id="business-intelligence-applications"><a class="markdownIt-Anchor" href="#business-intelligence-applications"></a> Business Intelligence Applications</h2><ul><li>Tableau (??)</li></ul><p>P59</p><h2 id="restaurant-metaphor-for-the-kimball-architecture"><a class="markdownIt-Anchor" href="#restaurant-metaphor-for-the-kimball-architecture"></a> Restaurant Metaphor for the Kimball Architecture</h2><ul><li>ETL : Backend kitchen</li></ul><p>ETL should focusing on ,<br><strong>Quality</strong><br><strong>Consistency</strong><br><strong>Integrity</strong></p><p>ETL should avoid being involved by DW/BI patrons.</p><ul><li>Data Presentation and BI: Front Dining Room</li></ul><p>Focusing on : properly organized and utilized to deliver as needed to the presentation area’s food, decor, service, and cost.</p><h1 id="alternative-dwbi-architectures"><a class="markdownIt-Anchor" href="#alternative-dwbi-architectures"></a> Alternative DW/BI Architectures</h1><h2 id="independent-data-mart-architecture"><a class="markdownIt-Anchor" href="#independent-data-mart-architecture"></a> Independent Data Mart Architecture</h2><ul><li>Data after multiple ETL logic landed in multiple models designed for different front room.</li><li>No centralized data governance</li><li>Short term low cost; normally already applied star schema for each model</li></ul><h2 id="hub-and-spoke-corporate-information-factory-inmon-architecture"><a class="markdownIt-Anchor" href="#hub-and-spoke-corporate-information-factory-inmon-architecture"></a> Hub-and-Spoke Corporate Information Factory Inmon Architecture</h2><ul><li>3NF is re-enforced</li></ul><h2 id="hybrid-hub-and-spoke-and-kimball-architecture"><a class="markdownIt-Anchor" href="#hybrid-hub-and-spoke-and-kimball-architecture"></a> Hybrid Hub-and-Spoke and Kimball Architecture</h2><ul><li>2 Layers of ETL</li><li>Source -> 3NF -> Kimball</li></ul><p>P66</p><h1 id="dimensional-modeling-myths"><a class="markdownIt-Anchor" href="#dimensional-modeling-myths"></a> Dimensional Modeling Myths</h1><h2 id="myth-1-dimensional-only-for-summary-data"><a class="markdownIt-Anchor" href="#myth-1-dimensional-only-for-summary-data"></a> Myth 1: Dimensional only for summary Data</h2><p>Summary data should complement the granular details solely to provide improved performance for common queries, <strong>but not replace the details.</strong><br>The amount of history in dimensional models must only be driven by business’s requirement nor the performance purpose.</p><h2 id="myth-2-dimensional-for-departmental"><a class="markdownIt-Anchor" href="#myth-2-dimensional-for-departmental"></a> Myth 2: Dimensional for Departmental</h2><h2 id="myth-3-dimensional-are-not-scalable"><a class="markdownIt-Anchor" href="#myth-3-dimensional-are-not-scalable"></a> Myth 3: Dimensional are not scalable</h2><p>It’s common for fact table to have billions of rows; some fact table containing 2 trillion rows have been seen.<br>Key difference between 3NF and Dimensional is Dimensional are easier to understand.</p><h2 id="myth-4-dimensional-only-for-predictable-usage"><a class="markdownIt-Anchor" href="#myth-4-dimensional-only-for-predictable-usage"></a> Myth 4: Dimensional only for predictable usage</h2><p>The model is center on measurement process not pre-defined reports or analyses.<br>“God is in the details”</p><h2 id="myth-5-dimensional-cant-be-integrated"><a class="markdownIt-Anchor" href="#myth-5-dimensional-cant-be-integrated"></a> Myth 5: Dimensional Can’t be integrated</h2><p>Data integration depends on standardized labels, values, and definitions.</p><h1 id="more-reasons-to-think-dimensionally"><a class="markdownIt-Anchor" href="#more-reasons-to-think-dimensionally"></a> More Reasons to Think Dimensionally</h1><p>Robust dimensions translate into robust DW/BI systems.</p><h1 id="agile-considerations"><a class="markdownIt-Anchor" href="#agile-considerations"></a> Agile Considerations</h1><h1 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h1>]]></content>
<tags>
<tag> Datawarehouse </tag>
<tag> BI </tag>
</tags>
</entry>
<entry>
<title>PostgreSQL</title>
<link href="2019/06/01/markdown/BackToBasic/Postgres/Management/"/>
<url>2019/06/01/markdown/BackToBasic/Postgres/Management/</url>
<content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/</a></p><p>CREATE ROLE readwrite;<br>GRANT CONNECT ON DATABASE “Datawarehouse” TO readwrite;<br>GRANT USAGE ON SCHEMA “dw_cons” TO readwrite;<br>GRANT USAGE, CREATE ON SCHEMA “dw_cons” TO readwrite;<br>GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA “dw_cons” TO readwrite;<br>ALTER DEFAULT PRIVILEGES IN SCHEMA “dw_cons” GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite;<br>GRANT USAGE ON ALL SEQUENCES IN SCHEMA “dw_cons” TO readwrite;<br>ALTER DEFAULT PRIVILEGES IN SCHEMA “dw_cons” GRANT USAGE ON SEQUENCES TO readwrite;</p><p>GRANT readonly TO “tableau_read”;<br>GRANT readwrite TO “tibco_write”;</p>]]></content>
<tags>
<tag> basic </tag>
<tag> PostgreSQL </tag>
</tags>
</entry>
<entry>
<title>Google Cloud Study Jam</title>
<link href="2019/05/14/markdown/Trending/Google/CloudStudyJam/"/>
<url>2019/05/14/markdown/Trending/Google/CloudStudyJam/</url>
<content type="html"><![CDATA[<p>gcloud ai-platform local predict <br>–model-dir output/export/census/1557796906 <br>–json-instances …/test.json</p><p>MODEL_BINARIES=$OUTPUT_PATH/export/census/1557797507/</p>]]></content>
<tags>
<tag> Jam </tag>
</tags>
</entry>
<entry>
<title>AWS - Site to Site VPN</title>
<link href="2019/05/11/markdown/AWS/AWS2018/Site2SiteVPN/"/>
<url>2019/05/11/markdown/AWS/AWS2018/Site2SiteVPN/</url>
<content type="html"><![CDATA[<h1 id="basic-steps"><a class="markdownIt-Anchor" href="#basic-steps"></a> Basic Steps</h1><h2 id="cloudformation"><a class="markdownIt-Anchor" href="#cloudformation"></a> Cloudformation</h2><ul><li>VPC with only private subnet; route table declared</li><li>VGW created and attached to VPC;</li><li>Propagation allowed via vgw to route table</li><li>CGW information declared;</li></ul><h2 id="create-site2sitevpn"><a class="markdownIt-Anchor" href="#create-site2sitevpn"></a> Create Site2SiteVPN</h2><ul><li><p>Pay attention to IPSec Tunnel Interconnection IP CIDR<br><a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-vpnconnection-vpntunneloptionsspecification.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-vpnconnection-vpntunneloptionsspecification.html</a></p></li><li><p>Download configuration and run from client side</p><ul><li>Pay attention to propagation CIDR</li></ul></li></ul><p>Client Side</p><ol><li>Confirm the Client Gateway support BGP</li><li>Allocate the IpSec tunnel interconnection ip cidr</li><li>Allocate AWS VPC IP range</li><li>Confirm Data Centre Propagating IP Rages (default will be 0.0.0.0)</li></ol>]]></content>
<tags>
<tag> AWS </tag>
<tag> Site2SiteVPN </tag>
</tags>
</entry>
<entry>
<title>EV3 Project</title>
<link href="2019/03/10/markdown/Trending/EV3/Loading/"/>
<url>2019/03/10/markdown/Trending/EV3/Loading/</url>
<content type="html"><![CDATA[<h1 id="preparation"><a class="markdownIt-Anchor" href="#preparation"></a> Preparation</h1><p>Flash the machine<br><a href="https://sites.google.com/site/ev3devpython/setting-up-vs-code" target="_blank" rel="noopener">https://sites.google.com/site/ev3devpython/setting-up-vs-code</a></p><p>Connecting with mac<br><a href="https://www.ev3dev.org/docs/tutorials/connecting-to-ev3dev-with-ssh/" target="_blank" rel="noopener">https://www.ev3dev.org/docs/tutorials/connecting-to-ev3dev-with-ssh/</a></p><p>issues<br><a href="https://github.com/ev3dev/ev3dev/issues/1220" target="_blank" rel="noopener">https://github.com/ev3dev/ev3dev/issues/1220</a></p><p>Wireless</p>]]></content>
<tags>
<tag> EV3 </tag>
<tag> Robotics </tag>
</tags>
</entry>
<entry>
<title>AWS - Notes about SSO with Azure</title>
<link href="2019/02/06/markdown/AWS/AWS2018/Azure_SSO_WithAWS/"/>
<url>2019/02/06/markdown/AWS/AWS2018/Azure_SSO_WithAWS/</url>
<content type="html"><![CDATA[<h1 id="update-single-azure-to-sso-to-multiple-aws"><a class="markdownIt-Anchor" href="#update-single-azure-to-sso-to-multiple-aws"></a> Update – Single Azure to SSO to multiple AWS</h1><ul><li>Identifier must be unique, it can be string</li></ul><h1 id="config-azure-ad-sso-to-aws-console-via-smal"><a class="markdownIt-Anchor" href="#config-azure-ad-sso-to-aws-console-via-smal"></a> Config Azure AD SSO to AWS Console via SMAL</h1><h2 id="azure-official-doc"><a class="markdownIt-Anchor" href="#azure-official-doc"></a> Azure Official Doc</h2><p><a href="https://docs.microsoft.com/en-us/azure/active-directory/saas-apps/amazon-web-service-tutorial" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/azure/active-directory/saas-apps/amazon-web-service-tutorial</a></p><h2 id="aditional-notes"><a class="markdownIt-Anchor" href="#aditional-notes"></a> Aditional Notes</h2><p>The config not align with above doc but needed when doing the config,</p><p>Example of claim key/values:</p><ul><li>name: emailaddress</li><li>Namespace: <a href="http://schemas.xmlsoap.org/ws/2005/05/identity/claims" target="_blank" rel="noopener">http://schemas.xmlsoap.org/ws/2005/05/identity/claims</a></li><li>Source: Attribute</li><li>Source attribute: user.mail</li></ul><p>Full config as below</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress</span><br><span class="line">user.mail</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname</span><br><span class="line">user.givenname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name</span><br><span class="line">user.userprincipalname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier</span><br><span class="line">user.userprincipalname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname</span><br><span class="line">user.surname</span><br><span class="line"></span><br><span class="line">https://aws.amazon.com/SAML/Attributes/Role</span><br><span class="line">user.assignedroles</span><br><span class="line"></span><br><span class="line">https://aws.amazon.com/SAML/Attributes/RoleSessionName</span><br><span class="line">user.userprincipalname</span><br></pre></td></tr></table></figure><p>After successful config, login via<br><a href="https://account.activedirectory.windowsazure.com/r#/applications" target="_blank" rel="noopener">https://account.activedirectory.windowsazure.com/r#/applications</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> Azure </tag>
<tag> SSO </tag>
</tags>
</entry>
<entry>
<title>AWS - RDS MySQL</title>
<link href="2018/08/24/markdown/AWS/AWS2018/09a_RDS_MySQL/"/>
<url>2018/08/24/markdown/AWS/AWS2018/09a_RDS_MySQL/</url>
<content type="html"><![CDATA[<h1 id="building-your-application-with-an-amazon-aurora-database-dem113"><a class="markdownIt-Anchor" href="#building-your-application-with-an-amazon-aurora-database-dem113"></a> Building Your Application with an Amazon Aurora Database (DEM113)</h1><p><a href="https://youtu.be/-ychuATbqPY" target="_blank" rel="noopener">https://youtu.be/-ychuATbqPY</a></p><h2 id="key-new-feature"><a class="markdownIt-Anchor" href="#key-new-feature"></a> Key New Feature</h2><ul><li>Serverless: Auto provision the computing power you need; scale up and down automatically.</li><li>Aurora parallel query<ul><li>An option when provision your DB, suitable for DB used for both transaction and analysis</li><li><a href="https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/</a></li><li>No extra cost but will be more expensive on IO</li></ul></li><li>Enable Backtrack (select the backup window)<ul><li>Be able to backtrack , extra cost 10USD/month</li></ul></li><li>Performance Insight<br>* by SQL by user(session)</li></ul><h1 id="running-a-high-performance-kubernetes-cluster-with-amazon-eks-con318-r1"><a class="markdownIt-Anchor" href="#running-a-high-performance-kubernetes-cluster-with-amazon-eks-con318-r1"></a> Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1)</h1><p><a href="https://youtu.be/YQWt6wdAZMU" target="_blank" rel="noopener">https://youtu.be/YQWt6wdAZMU</a></p><h2 id="optimize-pod-placement"><a class="markdownIt-Anchor" href="#optimize-pod-placement"></a> Optimize pod placement</h2><ol><li>limit the resource</li><li>Density vs. Size of pods</li><li>Anti-affinity : keep the CPU heavy pods onto different hosts</li></ol><h2 id="use-diagram-to-balance-the-design"><a class="markdownIt-Anchor" href="#use-diagram-to-balance-the-design"></a> Use diagram to balance the design</h2><ol><li>Anti-affinity</li><li>Secretes</li><li>Number of Nodes</li><li>Active Namespaces</li><li>Pod Churn</li><li>Pod Density</li><li>Networking</li></ol><h2 id="use-k8s-with-database"><a class="markdownIt-Anchor" href="#use-k8s-with-database"></a> Use K8S with Database</h2><p>When choosing the persistence layer you have 3 options, inside pod, outside but in same box, outside box.</p><p>37:12</p><h1 id="data-migration"><a class="markdownIt-Anchor" href="#data-migration"></a> Data migration</h1><ul><li>take backup from replica or slave</li><li>compress backup for transfer</li><li>use primary key sort order where possible</li><li>to speed up data loading : more memory + IOPS</li><li>disable binary logging and</li><li>change some of the configuration to reduce server writing logs to disk (because we are dumping the data, no issue of in-flight transaction)</li></ul><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> terminology</h2><p>binlog : transaction logs<br><a href="https://www.cnblogs.com/Cherie/p/3309503.html" target="_blank" rel="noopener">https://www.cnblogs.com/Cherie/p/3309503.html</a></p><p>Default is 0, V5.6 changed to 1, but not much impact the performance</p><h2 id="data-loading-format"><a class="markdownIt-Anchor" href="#data-loading-format"></a> Data loading format</h2><p>SQL:</p><ul><li>easy and simple</li><li>for small db<br>Flatfiles:</li><li>schema load</li><li>fault torlerance (each file loading is a separate transaction)</li></ul><h1 id="normal-steps-to-migrate-database"><a class="markdownIt-Anchor" href="#normal-steps-to-migrate-database"></a> Normal steps to migrate database</h1><h2 id="from-on-premise-to-rds"><a class="markdownIt-Anchor" href="#from-on-premise-to-rds"></a> From on-premise to RDS</h2><ul><li>configure replication target and start replication</li><li>stop the application binding with origin source, stop replication after new target catches up</li><li>promote new target instance</li><li>change app binding pointing to new.</li></ul><h2 id="from-rds-to-on-premise"><a class="markdownIt-Anchor" href="#from-rds-to-on-premise"></a> From RDS to on-premise</h2><ul><li>RDS provide Point in time recovery</li></ul><h2 id="rds-data-to-redshift"><a class="markdownIt-Anchor" href="#rds-data-to-redshift"></a> RDS data to redshift</h2><ul><li>change the binlog config to “ROW”</li></ul><h1 id="multi-az-fail-over"><a class="markdownIt-Anchor" href="#multi-az-fail-over"></a> Multi-AZ Fail Over</h1><p>Around 1 min for fail over</p><ul><li>25 sec – detect failure</li><li>5 sec – promote standby</li><li>30 sec – CN Name (DNS) update</li><li>standby sits in different AZ, read replica sits in different region</li></ul><h1 id="important-scaling-archi"><a class="markdownIt-Anchor" href="#important-scaling-archi"></a> important scaling archi</h1><p>(???)</p><ul><li>for reading intensive application (for example 90% reads) — create more more read replica</li><li>for writes intensive (for example 20%) ---- 2 SCALE</li></ul><h1 id="rebooting-performance"><a class="markdownIt-Anchor" href="#rebooting-performance"></a> Rebooting performance</h1><p>If mysql is using InnoDB as engine, when rebooting, you can do cache warming to improve the performance. The feature is called CacheWarmer Turned down (cache before turning down)</p><h1 id="handle-schema-change"><a class="markdownIt-Anchor" href="#handle-schema-change"></a> Handle Schema Change</h1><ul><li>Option 1, promote standby approach</li><li>Option 2, use MySQL 5.6 new feature<ul><li>no blocked DML in most cases</li><li>Perfomance impact: data reorg(sometimes), cpu io , replica lag</li><li>45 min</li></ul></li><li>pt-online-schema-change tool<ul><li>Less performance impact , but longer (2 hours)</li><li>needs to start a EC2 and install the tools</li></ul></li></ul><h1 id="burst-mode"><a class="markdownIt-Anchor" href="#burst-mode"></a> Burst mode</h1><p>GP2 is designed to burst iops<br>T2 is designed to burst CPU</p><ul><li>The newer instance types with burst feature can save costs</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://youtu.be/ZQnzjhnDloM" target="_blank" rel="noopener">https://youtu.be/ZQnzjhnDloM</a></p><h1 id="difference-between-mysql-and-mariadb"><a class="markdownIt-Anchor" href="#difference-between-mysql-and-mariadb"></a> Difference between MySQL and MariaDB</h1><blockquote></blockquote><p><a href="https://blog.panoply.io/a-comparative-vmariadb-vs-mysql" target="_blank" rel="noopener">https://blog.panoply.io/a-comparative-vmariadb-vs-mysql</a></p>]]></content>
<tags>
<tag> AWS </tag>
<tag> AWS RDS </tag>
<tag> MySQL </tag>
</tags>
</entry>
<entry>
<title>AWS - Kinesis</title>
<link href="2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/"/>
<url>2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/</url>
<content type="html"><![CDATA[<h1 id="kinesis-deepdive"><a class="markdownIt-Anchor" href="#kinesis-deepdive"></a> Kinesis Deepdive</h1><ul><li>No 1 popular scenario : moving small and fast moving data into persistent layer</li><li>No 2 popular scenario : Steaming data , NRT notification systems</li></ul><p>Kinesis:</p><ul><li>managed services</li><li>streaming data ingestion</li><li>continously processing</li></ul><p>Small , fast moving data, being captured quickly , then being consumed concurrently by multi different consumers for different analytics Purpose.</p><ul><li>You can split / merge shards via console</li></ul><h2 id="best-practises"><a class="markdownIt-Anchor" href="#best-practises"></a> best practises</h2><h3 id="partition-key-strategy"><a class="markdownIt-Anchor" href="#partition-key-strategy"></a> partition key strategy</h3><ul><li>Avoid hot shard<ul><li>use random partition key</li><li>use high cardinality key</li><li>use business key : per billing customer or per device id or per stock symbol</li></ul></li></ul><h3 id="provision-shards"><a class="markdownIt-Anchor" href="#provision-shards"></a> provision shards</h3><ul><li>provision enough shards</li><li>give some head-room in the event of application failures</li></ul><h3 id="put-data-into-kinesis"><a class="markdownIt-Anchor" href="#put-data-into-kinesis"></a> put data into Kinesis</h3><ul><li>do micro-batch before put</li><li>consider async producer by AWS SDK<ul><li>Kinesis-Log4j-Appender</li></ul></li><li><strong>provisionedThroughputExceeded Error</strong><ul><li>retry</li><li>re-shard</li><li>track & monitor</li></ul></li><li>command to scale up</li></ul><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=myStream -Dscaling-action=scaleUp -Dcount=<span class="number">10</span> -Dregion=eu-west-<span class="number">1</span></span><br></pre></td></tr></table></figure><h3 id="ingest-data-from-kinesis"><a class="markdownIt-Anchor" href="#ingest-data-from-kinesis"></a> ingest data from kinesis</h3><ul><li>Amazon JDK<ul><li>one worker maps to one shard</li><li>libary to feed data into S3, DynamoDB , Redshift, Elastic Search.</li><li>feeding data following below pipeline,<ul><li>ITransformer: transform the data read from Kinesis</li><li>IFilter: filter only data interested</li><li>IBuffer: batching the data before sending out (for example to S3 or Redshift, better buffer to MB level before sending out)</li></ul></li><li>connector to redshift will put data into S3 first and buffer it then send to redshift</li></ul></li><li>application consuming the data better has the capability to scale automatically</li><li>use Matric to detect why the consumer is slow<ul><li>GetRecord.Latency</li></ul></li><li>build flush-to-S3 consumer to capture original data (by number; by byte ;by time)</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/8u9wIC1xNt8" target="_blank" rel="noopener">https://youtu.be/8u9wIC1xNt8</a></p></blockquote>]]></content>
<tags>
<tag> AWS </tag>
<tag> Kinesis </tag>
</tags>
</entry>
<entry>
<title>AWS - Redshift Deepdive</title>
<link href="2018/07/29/markdown/AWS/AWS2018/023a_RedShift/"/>
<url>2018/07/29/markdown/AWS/AWS2018/023a_RedShift/</url>
<content type="html"><![CDATA[<h1 id="redshift-archi-overview"><a class="markdownIt-Anchor" href="#redshift-archi-overview"></a> Redshift Archi overview</h1><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/023_RedShiftClusterArchi.png?raw=true" alt="Redshift Cluster Archi "></p><ul><li>Bottom Layer: Ingestion Backup & Restore layer</li><li>Leader Node & Compute Node<ul><li>Leader node :</li></ul></li><li>Share Nothing MPP (Massive Parellel Processing) Architecture</li><li>Reduce IO<ul><li>Columnar Storage</li><li>Compress data ( By Column)</li><li><strong>Zone Maps</strong> : in memory map about min and max value for given column in current block, to prune the query and reduce IO</li></ul></li><li><strong>Slices</strong><ul><li>depending on cpu cores, each node support different number of slices</li><li>unit of data partitioning / parallel processing</li><li>table rows are distributed into different slices</li></ul></li><li>Data Distribution :<ul><li>ALL; Key; Even(Round robin)</li></ul></li><li>Two types of hardwares as storage<ul><li>HDD is slower but can scale to petabytes (2PB); SSD is faster but can only support to 300+ TB</li></ul></li></ul><h2 id="storage-deep-dive"><a class="markdownIt-Anchor" href="#storage-deep-dive"></a> Storage Deep Dive</h2><ul><li>Advertised (pricing) storage is 1/3 of the true utilized storage, because 2/3 used to data copies.</li><li><strong>Blocks</strong> : column data persisted as 1MB immutable blocks.<ul><li>With zone map metadata</li><li>location of next block</li><li>can be compressed</li></ul></li><li>Small write has similiar cost with larger write(1~10 rows = 100k rows)</li><li>Update & Delete will only trigger soft delete, use VACUUM or DEEP COPY to delete ghost rows</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/iuQgZDs-W7A" target="_blank" rel="noopener">https://youtu.be/iuQgZDs-W7A</a></p></blockquote><h1 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h1><ul><li><1k/TB/Year</li></ul><h2 id="data-ingestion"><a class="markdownIt-Anchor" href="#data-ingestion"></a> Data Ingestion</h2><ul><li><p>Ingestion Source: SSH, S3, EMR, DynamoDB</p></li><li><p>for COPY command, one slice can only single thread one COPY command.</p><ul><li>To get 100M/s , you need multiple slices and multiple nodes</li><li>Batch inserts will save commit cost</li><li>If you have 16 slices, use 16 concurrent copy commands to 16 files to maximize performance</li><li>During COPY Redshift don’t enforce primary key</li><li>Provide manifest file in json format on S3 while copying from S3 to make sure the load Behaviors are as expected.</li></ul></li><li><p>Redshift will appy Query Optimizer but how the optimize depends on statistics</p><ul><li>COPY will do statistics automatically</li></ul></li><li><p>Redshift Data Compression</p><ul><li>COPY will do compression automatically and select encoding automatially</li></ul></li><li><p>Data Hygiene</p><ul><li>Analysis regularly (sort every week)</li><li>Vacuum regularly (weekly)</li><li>Use SVV_Table_Info</li></ul></li><li><p>Automatic Compression</p><ul><li>Don’t compress sort keys<ul><li>If might result in you scan more rows than you needed ( many rows in one block by compression )</li></ul></li></ul></li><li><p>Varchar column (define as small as possible)</p><ul><li>the more varchar waste the memeory the less rows being loaded in memeory to do query (spilled into disk)</li></ul></li><li><p>Compound Sort Keys</p></li><li><p>Don’t Forklift</p></li><li><p>On redshift :</p><ul><li>Update = delete + insert</li><li>Commits are expensive ; blocks are immutable (1mb) – load 1k rows a time</li><li>no small commit</li><li>Concurrency should be low for better throughput</li></ul></li><li><p>between redshift and dashboard, add a cache layer</p></li><li><p><strong>Work Load Management</strong></p></li></ul><h2 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h2><ul><li>Source Data from S3 – Use Envolope Encryption</li><li>Encrypt data at rest<ul><li>enable when create the cluster</li><li>Hardware acceleration (HSM)</li><li>~20% performance impact</li><li>4 layers of keys: block;database;cluster; master<ul><li>Benefit: key rotate means use new key to encrypt the upper level key, not re-encrypt the whole data</li></ul></li></ul></li><li>Encrypt data with certain column to restrict view to certain customer</li><li>Support automatically encrypt Unload data (unload data from redshift to S3 files)</li></ul><h2 id="udf-user-defined-functions"><a class="markdownIt-Anchor" href="#udf-user-defined-functions"></a> UDF – User Defined functions</h2><ul><li>Use Python to write UDF</li><li>Aggregate UDF<ul><li>you need to implement ini function , aggregation function and finalize function</li></ul></li></ul><h2 id="multi-demintional-indexing-with-space-filling-curves"><a class="markdownIt-Anchor" href="#multi-demintional-indexing-with-space-filling-curves"></a> Multi-demintional indexing with space filling curves</h2><ul><li>When data started to grow, you started to have<ul><li><strong>zone Maps</strong> : stores min max value of a block in memory</li><li>Sorting</li><li>Projection : mutiple copies of data sorted using different ways</li></ul></li><li>new keyword to index <strong>INTERLEAVED</strong></li></ul><h2 id="user-reference"><a class="markdownIt-Anchor" href="#user-reference"></a> User reference</h2><ul><li>automation framework : Azakaban (LinkedIn)</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/fmy3jCxUliM" target="_blank" rel="noopener">https://youtu.be/fmy3jCxUliM</a></p></blockquote><blockquote><p>Deepdive 2014<br><a href="https://youtu.be/K-Usisr0zwg" target="_blank" rel="noopener">https://youtu.be/K-Usisr0zwg</a></p></blockquote>]]></content>
<tags>
<tag> AWS </tag>
<tag> Redshift </tag>
</tags>
</entry>
<entry>
<title>Buzz Words</title>
<link href="2018/07/25/markdown/BackToBasic/buzzwords/"/>
<url>2018/07/25/markdown/BackToBasic/buzzwords/</url>
<content type="html"><![CDATA[<h1 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h1><p>Symmetric vs Asymmetric encryption</p><h1 id="blockchain"><a class="markdownIt-Anchor" href="#blockchain"></a> Blockchain</h1><ul><li>cryptographically verifiable</li></ul><h1 id="security-2"><a class="markdownIt-Anchor" href="#security-2"></a> Security</h1><ul><li>BlastRadius</li></ul>]]></content>
<tags>
<tag> buzz words </tag>
</tags>
</entry>