forked from psyelephant/stevenpoitras.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
6196 lines (4445 loc) · 280 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<title>The Nutanix Bible</title>
<meta name="description" content="The Nutanix Bible - A detailed narrative of the Nutanix architecture, how the software and features work and how to leverage it for maximum performance."/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="keywords" content="nutanix, nutanix bible,nutanix architecture, prism, acropolis,nutanix openstack, webscale"/>
<meta name="robots" content="index, follow"/>
<!-- Open Graph data -->
<meta property="og:title" content="The Nutanix Bible - NutanixBible.com"/>
<meta property="og:locale" content="en_US"/>
<meta property="og:type" content="website"/>
<meta property="og:description" content="The Nutanix Bible - A detailed narrative of the Nutanix architecture, how the software and features work and how to leverage it for maximum performance."/>
<meta property="og:url" content="http://NutanixBible.com"/>
<meta property="og:site_name" content="NutanixBible.com"/>
<meta property="og:image" content="http://nutanixbible.com/assets/Bible.png"/>
<!-- Twitter Card data -->
<meta name="twitter:card" content="summary"/>
<meta name="twitter:url" content="http://NutanixBible.com"/>
<meta name="twitter:description" content="The Nutanix Bible - A detailed narrative of the Nutanix architecture, how the software and features work and how to leverage it for maximum performance."/>
<meta name="twitter:title" content="The Nutanix Bible - NutanixBible.com"/>
<meta name="twitter:site" content="@StevenPoitras"/>
<meta name="twitter:domain" content="NutanixBible.com"/>
<meta name="twitter:image:src" content="http://nutanixbible.com/assets/Bible.png"/>
<meta name="twitter:creator" content="@StevenPoitras"/>
<!-- Google+ data -->
<meta itemprop="name" content="The Nutanix Bible - NutanixBible.com">
<meta itemprop="description" content="The Nutanix Bible - A detailed narrative of the Nutanix architecture, how the software and features work and how to leverage it for maximum performance.">
<meta itemprop="image" content="http://nutanixbible.com/assets/Bible.png">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-66778923-1', 'auto');
ga('send', 'pageview');
</script>
<link rel="stylesheet" type="text/css" href="css/nutanixbible.css">
</head>
<body data-type="book">
<!-- Google Tag Manager -->
<noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-TR9PVL" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-TR9PVL');</script>
<!-- End Google Tag Manager -->
<div class="container">
<section data-type="titlepage" class="page-title" id="the-nutanix-bible-L02Ia">
<img src="assets/Bible.svg" alt="" class="biblesvg">
<h1>The Nutanix Bible</h1>
<p class="author">by Steven Poitras</p>
</section>
<section data-type="copyright-page" class="page-title" id="id-7ANIl">
<img src="assets/ornament1.svg" alt="" class="ornament">
<p class="small"><b>Copyright (c) 2016:</b> The Nutanix Bible and NutanixBible.com, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Steven Poitras and NutanixBible.com with appropriate and specific direction to the original content.</p>
<p>
Have feedback? Find a typo? Send feedback to <a href="mailto:[email protected]?Subject=Nutanix%20Bible%20Feedback!"</a>[email protected]!
</p>
<br>
<p>
Localized versions available:
</p>
<div class="localization">
<a href="http://nutanixbible.jp/" target="_blank">
<img src="assets/flag-japanese.svg" alt="Japanese" class="japanese">
</a>
<a href="http://www.virtual-space.co.kr/nutanix-works.html" target="_blank">
<img src="assets/flag-korean.svg" alt="Korean" class="korean">
</a>
<a href="http://nutanix.ru/" target="_blank">
<img src="assets/flag-russian.svg" alt="Russian" class="russian">
</a>
<a href="http://go.nutanix.com/rs/031-GVQ-112/images/Nutanix%20Bible[CN].pdf" target="_blank">
<img src="assets/flag-chinese.svg" alt="Chinese" class="Chinese">
</a>
</div>
</section>
<!-- START: Ken Chen 11-17-2015-->
<div id="nav-icon"><div></div></div>
<div class="nav-title">
Table of Content
<div id="nav-close-button"></div>
</div>
<!-- END: Ken Chen 11-17-2015-->
<nav data-type="toc">
</nav>
<section data-type="preface" class="preface" id="foreword-7kBIw">
<h1 id="anchor-foreword-1">Foreword</h1>
<figure class="small" id="id-wntQsz">
<img alt="Dheeraj Pandey" class="iimagesv2dheeraj_pandeyjpg" src="imagesv2/Dheeraj_Pandey.jpg" style="width:60%; max-width:218px; horizontal-align:middle">
<figcaption><span class="label">Figure 1-1. </span>
<p class="sign">Dheeraj Pandey, CEO, Nutanix</p>
</figcaption>
</figure>
<blockquote>
<p>I am honored to write a foreword for this book that we've come to call "The Nutanix Bible." First and foremost, let me address the name of the book, which to some would seem not fully inclusive vis-à-vis their own faiths, or to others who are agnostic or atheist. There is a Merriam Webster meaning of the word "bible" that is not literally about scriptures: "a publication that is preeminent especially in authoritativeness or wide readership". And that is how you should interpret its roots. It started being written by one of the most humble yet knowledgeable employees at Nutanix, Steven Poitras, our first Solution Architect who continues to be authoritative on the subject without wielding his "early employee" primogeniture. Knowledge to him was not power -- the act of sharing that knowledge is what makes him eminently powerful in this company. Steve epitomizes culture in this company -- by helping everyone else out with his authority on the subject, by helping them automate their chores in Power Shell or Python, by building insightful reference architectures (that are beautifully balanced in both content and form), by being a real-time buddy to anyone needing help on Yammer or Twitter, by being transparent with engineers on the need to self-reflect and self-improve, and by being ambitious.</p>
<p>When he came forward to write a blog, his big dream was to lead with transparency, and to build advocates in the field who would be empowered to make design trade-offs based on this transparency. It is rare for companies to open up on design and architecture as much as Steve has with his blog. Most open source companies -- who at the surface might seem transparent because their code is open source -- never talk in-depth about design, and "how it works" under the hood. When our competitors know about our product or design weaknesses, it makes us stronger -- because there is very little to hide, and everything to gain when something gets critiqued under a crosshair. A public admonition of a feature trade-off or a design decision drives the entire company on Yammer in quick time, and before long, we've a conclusion on whether it is a genuine weakness or a true strength that someone is fear-mongering on. Nutanix Bible, in essence, protects us from drinking our own kool aid. That is the power of an honest discourse with our customers and partners.</p>
<p>This ever-improving artifact, beyond being authoritative, is also enjoying wide readership across the world. Architects, managers, and CIOs alike, have stopped me in conference hallways to talk about how refreshingly lucid the writing style is, with some painfully detailed illustrations, visio diagrams, and pictorials. Steve has taken time to tell the web-scale story, without taking shortcuts. Democratizing our distributed architecture was not going to be easy in a world where most IT practitioners have been buried in dealing with the "urgent". The Bible bridges the gap between IT and DevOps, because it attempts to explain computer science and software engineering trade-offs in very simple terms. We hope that in the coming 3-5 years, IT will speak a language that helps them get closer to the DevOps' web-scale jargon.</p>
<p>With this first edition, we are converting Steve's blog into a book. The day we stop adding to this book is the beginning of the end of this company. I expect each and everyone of you to keep reminding us of what brought us this far: truth, the whole truth, and nothing but the truth, will set you free (from complacency and hubris).</p>
<p>Keep us honest.</p>
</blockquote>
<p> </p>
<p class="sign">--Dheeraj Pandey, CEO, Nutanix</p>
<p> </p>
<figure id="id-aztlFk"><img alt="Stuart Miniman" class="iimagesv2stujpg" src="imagesv2/Stu.jpg" style="width:80%; max-width:218px; horizontal-align:middle">
<figcaption><span class="label">Figure 1-2. </span>
<p class="sign">Stuart Miniman, Principal Research Contributor, Wikibon</p>
</figcaption>
</figure>
<blockquote>
<p>Users today are constantly barraged by new technologies. There is no limit of new opportunities for IT to change to a "new and better way", but the adoption of new technology and more importantly, the change of operations and processes is difficult. Even the huge growth of open source technologies has been hampered by lack of adequate documentation. Wikibon was founded on the principal that the community can help with this problem and in that spirit, The Nutanix Bible, which started as a blog post by Steve Poitras, has become a valuable reference point for IT practitioners that want to learn about hypercovergence and web-scale principles or to dig deep into Nutanix and hypervisor architectures. The concepts that Steve has written about are advanced software engineering problems that some of the smartest engineers in the industry have designed a solution for. The book explains these technologies in a way that is understandable to IT generalists without compromising the technical veracity.</p>
<p>The concepts of distributed systems and software-led infrastructure are critical for IT practitioners to understand. I encourage both Nutanix customers and everyone who wants to understand these trends to read the book. The technologies discussed here power some of the largest datacenters in the world.</p>
</blockquote>
<p> </p>
<p class="sign">--Stuart Miniman, Principal Research Contributor, Wikibon</p>
<h2>Introduction</h2>
<figure id="id-ZptOIk"><img alt="Steven Poitras" src="imagesv2/poitras_pic.jpg" style="width:80%; max-width:500px; horizontal-align:middle">
<figcaption><span class="label">Figure 1-3. </span>
<p class="sign">Steven Poitras, Principal Solutions Architect, Nutanix</p>
</figcaption>
</figure>
<blockquote>
<p>Welcome to The Nutanix Bible! I work with the Nutanix platform on a daily basis – trying to find issues, push its limits as well as administer it for my production benchmarking lab. This item is being produced to serve as a living document outlining tips and tricks used every day by myself and a variety of engineers here at Nutanix.</p>
<p>NOTE: What you see here is an under the covers look at how things work. With that said, all topics discussed are abstracted by Nutanix and knowledge isn't required to successfully operate a Nutanix environment!</p>
<p>Enjoy!</p>
</blockquote>
<p> </p>
<p class="sign">--Steven Poitras, Principal Solutions Architect, Nutanix</p>
</section>
<div data-type="part" id="a-brief-lesson-in-history-6qVi1">
<h1><span class="label">Part I. </span>A Brief Lesson in History</h1>
<p>A brief look at the history of infrastructure and what has led us to where we are today.</p>
<section data-type="chapter" id="the-evolution-of-the-datacenter-R5INu4">
<h2>The Evolution of the Datacenter</h2>
<p>The datacenter has evolved significantly over the last several decades. The following sections will examine each era in detail. </p>
<section data-type="sect1" id="the-era-of-the-mainframe-NYI5u8uq">
<h3>The Era of the Mainframe</h3>
<p>The mainframe ruled for many years and laid the core foundation of where we are today. It allowed companies to leverage the following key characteristics:</p>
<ul>
<li>Natively converged CPU, main memory, and storage</li>
<li>Engineered internal redundancy</li>
</ul>
<p>But the mainframe also introduced the following issues:</p>
<ul>
<li>The high costs of procuring infrastructure</li>
<li>Inherent complexity</li>
<li>A lack of flexibility and highly siloed environments</li>
</ul>
</section>
<section data-type="sect1" id="the-move-to-stand-alone-servers-22IlTzuZ">
<h3>The Move to Stand-Alone Servers</h3>
<p>With mainframes, it was very difficult for organizations within a business to leverage these capabilities which partly led to the entrance of pizza boxes or stand-alone servers. Key characteristics of stand-alone servers included:</p>
<ul>
<li>CPU, main memory, and DAS storage</li>
<li>Higher flexibility than the mainframe</li>
<li>Accessed over the network</li>
</ul>
<p>These stand-alone servers introduced more issues:</p>
<ul>
<li>Increased number of silos</li>
<li>Low or unequal resource utilization</li>
<li>The server became a single point of failure (SPOF) for both compute AND storage</li>
</ul>
</section>
<section data-type="sect1" id="centralized-storage-3jIvSMu5">
<h3>Centralized Storage</h3>
<p>Businesses always need to make money and data is a key piece of that puzzle. With direct-attached storage (DAS), organizations either needed more space than was locally available, or data high availability (HA) where a server failure wouldn’t cause data unavailability.</p>
<p>Centralized storage replaced both the mainframe and the stand-alone server with sharable, larger pools of storage that also provided data protection. Key characteristics of centralized storage included:</p>
<ul>
<li>Pooled storage resources led to better storage utilization</li>
<li>Centralized data protection via RAID eliminated the chance that server loss caused data loss</li>
<li>Storage were performed over the network</li>
</ul>
<p>Issues with centralized storage included:</p>
<ul>
<li>They were potentially more expensive, however data is more valuable than the hardware</li>
<li>Increased complexity (SAN Fabric, WWPNs, RAID groups, volumes, spindle counts, etc.)</li>
<li>They required another management tool / team</li>
</ul>
</section>
<section data-type="sect1" id="the-introduction-of-virtualization-PgIzHQum">
<h3>The Introduction of Virtualization</h3>
<p>At this point in time, compute utilization was low and resource efficiency was impacting the bottom line. Virtualization was then introduced and enabled multiple workloads and operating systems (OSs) to run as virtual machines (VMs) on a single piece of hardware. Virtualization enabled businesses to increase utilization of their pizza boxes, but also increased the number of silos and the impacts of an outage. Key characteristics of virtualization included:</p>
<ul>
<li>Abstracting the OS from hardware (VM)</li>
<li>Very efficient compute utilization led to workload consolidation</li>
</ul>
<p>Issues with virtualization included:</p>
<ul>
<li>An increase in the number of silos and management complexity</li>
<li>A lack of VM high-availability, so if a compute node failed the impact was much larger</li>
<li>A lack of pooled resources</li>
<li>The need for another management tool / team</li>
</ul>
</section>
<section data-type="sect1" id="virtualization-matures-lkInFEuj">
<h3>Virtualization Matures</h3>
<p>The hypervisor became a very efficient and feature-filled solution. With the advent of tools, including VMware vMotion, HA, and DRS, users obtained the ability to provide VM high availability and migrate compute workloads dynamically. The only caveat was the reliance on centralized storage, causing the two paths to merge. The only down turn was the increased load on the storage array before and VM sprawl led to contention for storage I/O. Key characteristics included:</p>
<ul>
<li>Clustering led to pooled compute resources</li>
<li>The ability to dynamically migrate workloads between compute nodes (DRS / vMotion)</li>
<li>The introduction of VM high availability (HA) in the case of a compute node failure</li>
<li>A requirement for centralized storage</li>
</ul>
<p>Issues included:</p>
<ul>
<li>Higher demand on storage due to VM sprawl</li>
<li>Requirements to scale out more arrays creating more silos and more complexity</li>
<li>Higher $ / GB due to requirement of an array</li>
<li>The possibility of resource contention on array</li>
<li>It made storage configuration much more complex due to the necessity to ensure:
<ul>
<li>VM to datastore / LUN ratios</li>
<li>Spindle count to facilitate I/O requirements</li>
</ul>
</li>
</ul>
</section>
<section data-type="sect1" id="solid-state-disks-ssds-M2I9tquP">
<h3>Solid State Disks (SSDs)</h3>
<p>SSDs helped alleviate this I/O bottleneck by providing much higher I/O performance without the need for tons of disk enclosures. However, given the extreme advances in performance, the controllers and network had not yet evolved to handle the vast I/O available. Key characteristics of SSDs included:</p>
<ul>
<li>Much higher I/O characteristics than traditional HDD</li>
<li>Essentially eliminated seek times</li>
</ul>
<p>SSD issues included:</p>
<ul>
<li>The bottleneck shifted from storage I/O on disk to the controller / network</li>
<li>Silos still remained</li>
<li>Array configuration complexity still remained</li>
</ul>
</section>
<section data-type="sect1">
<h3>In Comes Cloud</h3>
<p>
The term cloud can be very ambiguous by definition. Simply put it's the ability to consume and leverage a service hosted somewhere provided by someone else.
</p>
<p>
With the introduction of cloud, the perspectives IT, the business and end-users
</p>
<p>
Core pillars of any cloud service:
</p>
<ul>
<li>
Self-service / On-demand
<ul>
<li>
Rapid time to value (TTV) / little barrier to entry
</li>
</ul>
</li>
<li>
Service and SLA focus
<ul>
<li>
Contractual guarantees around uptime / availability / performance
</li>
</ul>
</li>
<li>
Fractional consumption model
<ul>
<li>
Pay for what you use (some services are free)
</li>
</ul>
</li>
</ul>
<h5>Cloud Classifications</h5>
<p>
Most general classifications of cloud fall into three main buckets (starting at the highest level and moving downward):
</p>
<ul>
<li>
Software as a Service (SaaS)
<ul>
<li>
Any software / service consumed via a simple url
</li>
<li>
Examples: Workday, Salesforce.com, Google search, etc.
</li>
</ul>
</li>
<li>
Platform as a Service (PaaS)
<ul>
<li>
Development and deployment platform
</li>
<li>
Examples: Amazon Elastic Beanstalk / Relational Database Services (RDS), Google App Engine, etc.
</li>
</ul>
</li>
<li>
Infrastructure as a Service (IaaS)
<ul>
<li>
VMs/Containers/NFV as a service
</li>
<li>
Examples: Amazon EC2/ECS, Microsoft Azure, Google Compute Engine (GCE), etc.
</li>
</ul>
</li>
</ul>
<h5>Shift in IT focus</h5>
<p>
Cloud poses an interesting dilemma for IT. They can embrace it, or they can try to provide an alternative. They want to keep the data internal, but need to allow for the self-service, rapid nature of cloud.
</p>
<p>
This shift forces IT to act more as a legitimate service provider to their end-users (company employees).
</p>
</section>
</section>
<section data-type="chapter" id="the-importance-of-latency-13I4Ta">
<h2>The Importance of Latency</h2>
<p>The figure below characterizes the various latencies for specific types of I/O:</p>
<table>
<tr>
<th>Item</th>
<th>Latency</th>
<th>Comments</th>
</tr>
<tr>
<td>L1 cache reference</td>
<td>0.5 ns</td>
<td></td>
</tr>
<tr>
<td>Branch Mispredict</td>
<td>5 ns</td>
<td></td>
</tr>
<tr>
<td>L2 cache reference</td>
<td>7 ns</td>
<td>14x L1 cache</td>
</tr>
<tr>
<td>Mutex lock/unlock</td>
<td>25 ns</td>
<td></td>
</tr>
<tr>
<td>Main memory reference</td>
<td>100 ns</td>
<td>20x L2 cache, 200x L1 cache</td>
</tr>
<tr>
<td>Compress 1KB with Zippy</td>
<td>3,000 ns</td>
<td></td>
</tr>
<tr>
<td>Sent 1KB over 1Gbps network</td>
<td>10,000 ns</td>
<td>0.01 ms</td>
</tr>
<tr>
<td>Read 4K randomly from SSD</td>
<td>150,000 ns</td>
<td>0.15 ms</td>
</tr>
<tr>
<td>Read 1MB sequentially from memory</td>
<td>250,000 ns</td>
<td>0.25 ms</td>
</tr>
<tr>
<td>Round trip within datacenter</td>
<td>500,000 ns</td>
<td>0.5 ms</td>
</tr>
<tr>
<td>Read 1MB sequentially from SSD</td>
<td>1,000,000 ns</td>
<td>1 ms, 4x memory</td>
</tr>
<tr>
<td>Disk seek</td>
<td>10,000,000 ns</td>
<td>10 ms, 20x datacenter round trip</td>
</tr>
<tr>
<td>Read 1MB sequentially from disk</td>
<td>20,000,000 ns</td>
<td>20 ms, 80x memory, 20x SSD</td>
</tr>
<tr>
<td>Send packet CA -> Netherlands -> CA</td>
<td>150,000,000 ns</td>
<td>150 ms</td>
</tr>
</table>
<p><em>(credit: Jeff Dean, https://gist.github.com/jboner/2841832)</em></p>
<p>The table above shows that the CPU can access its caches at anywhere from ~0.5-7ns (L1 vs. L2). For main memory, these accesses occur at ~100ns, whereas a local 4K SSD read is ~150,000ns or 0.15ms.</p>
<p>If we take a typical enterprise-class SSD (in this case the Intel S3700 - <a href="http://download.intel.com/newsroom/kits/ssd/pdfs/Intel_SSD_DC_S3700_Product_Specification.pdf">SPEC</a>), this device is capable of the following:</p>
<ul>
<li>Random I/O performance:
<ul>
<li>Random 4K Reads: Up to 75,000 IOPS</li>
<li>Random 4K Writes: Up to 36,000 IOPS</li>
</ul>
</li>
<li>Sequential bandwidth:
<ul>
<li>Sustained Sequential Read: Up to 500MB/s</li>
<li>Sustained Sequential Write: Up to 460MB/s</li>
</ul>
</li>
<li>Latency:
<ul>
<li>Read: 50us</li>
<li>Write: 65us</li>
</ul>
</li>
</ul>
<section data-type="sect1" id="looking-at-the-bandwidth-QMIXtzTn">
<h3>Looking at the Bandwidth</h3>
<p>For traditional storage, there are a few main types of media for I/O:</p>
<ul>
<li>Fiber Channel (FC)
<ul>
<li>4-, 8-, and 16-Gb</li>
</ul>
</li>
<li>Ethernet (including FCoE)
<ul>
<li>1-, 10-Gb, (40-Gb IB), etc.</li>
</ul>
</li>
</ul>
<p>For the calculation below, we are using the 500MB/s Read and 460MB/s Write BW available from the Intel S3700.</p>
<p>The calculation is done as follows:</p>
<p>numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))</p>
<p><i>NOTE: </i><em>Numbers were rounded up as a partial SSD isn’t possible. This also does not account for the necessary CPU required to handle all of the I/O and assumes unlimited controller CPU power.</em></p>
<table>
<tbody>
<tr>
<th colspan="2" rowspan="1">Network BW</th>
<th colspan="2" rowspan="1">SSDs required to saturate network BW</th>
</tr>
<tr>
<th>Controller Connectivity</th>
<th>Available Network BW</th>
<th>Read I/O</th>
<th>Write I/O</th>
</tr>
<tr>
<td>Dual 4Gb FC</td>
<td>8Gb == 1GB</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>Dual 8Gb FC</td>
<td>16Gb == 2GB</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>Dual 16Gb FC</td>
<td>32Gb == 4GB</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<td>Dual 1Gb ETH</td>
<td>2Gb == 0.25GB</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Dual 10Gb ETH</td>
<td>20Gb == 2.5GB</td>
<td>5</td>
<td>6</td>
</tr>
</tbody>
</table>
<p>As the table shows, if you wanted to leverage the theoretical maximum performance an SSD could offer, the network can become a bottleneck with anywhere from 1 to 9 SSDs depending on the type of networking leveraged</p>
</section>
<section data-type="sect1" id="the-impact-to-memory-latency-J5IMf8Ty">
<h3>The Impact to Memory Latency</h3>
<p>Typical main memory latency is ~100ns (will vary), we can perform the following calculations:</p>
<ul>
<li>Local memory read latency = 100ns + [OS / hypervisor overhead]</li>
<li>Network memory read latency = 100ns + NW RTT latency + [2 x OS / hypervisor overhead]</li>
</ul>
<p>If we assume a typical network RTT is ~0.5ms (will vary by switch vendor) which is ~500,000ns that would come down to:</p>
<ul>
<li>Network memory read latency = 100ns + 500,000ns + [2 x OS / hypervisor overhead]</li>
</ul>
<p>If we theoretically assume a very fast network with a 10,000ns RTT:</p>
<ul>
<li>Network memory read latency = 100ns + 10,000ns + [2 x OS / hypervisor overhead]</li>
</ul>
<p>What that means is even with a theoretically fast network, there is a 10,000% overhead when compared to a non-network memory access. With a slow network this can be upwards of a 500,000% latency overhead.</p>
<p>In order to alleviate this overhead, server side caching technologies are introduced.</p>
</section>
</section>
<section data-type="chapter" id="book-of-web-scale-NYIQSn">
<h2>Book of Web-Scale</h2>
<p class="definition"><strong>web·scale - /web ' skãl/ - noun - computing architecture</strong>
<br>
a new architectural approach to infrastructure and computing.</p>
<p>This section will present some of the core concepts behind “Web-scale” infrastructure and why we leverage them. Before I get started, I just wanted to clearly state the Web-scale doesn’t mean you need to be “web-scale” (e.g. Google, Facebook, or Microsoft). These constructs are applicable and beneficial at any scale (3-nodes or thousands of nodes).</p>
<p>Historical challenges included:</p>
<ul>
<li>Complexity, complexity, complexity</li>
<li>Desire for incremental based growth</li>
<li>The need to be agile</li>
</ul>
<p>There are a few key constructs used when talking about “Web-scale” infrastructure:</p>
<ul>
<li>Hyper-convergence</li>
<li>Software defined intelligence</li>
<li>Distributed autonomous systems</li>
<li>Incremental and linear scale out</li>
</ul>
<p>Other related items:</p>
<ul>
<li>API-based automation and rich analytics</li>
<li>Self-healing</li>
</ul>
<p>The following sections will provide a technical perspective on what they actually mean.</p>
<section data-type="sect1" id="hyper-convergence-ONIRcvSY">
<h3>Hyper-Convergence</h3>
<p>There are differing opinions on what hyper-convergence actually is. It also varies based on the scope of components (e.g. virtualization, networking, etc.). However, the core concept comes down to the following: natively combining two or more components into a single unit. ‘Natively’ is the key word here. In order to be the most effective, the components must be natively integrated and not just bundled together. In the case of Nutanix, we natively converge compute + storage to form a single node used in our appliance. For others, this might be converging storage with the network, etc. What it really means:</p>
<ul>
<li>Natively integrating two or more components into a single unit which can be easily scaled</li>
</ul>
<p>Benefits include:</p>
<ul>
<li>Single unit to scale</li>
<li>Localized I/O</li>
<li>Eliminates traditional compute / storage silos by converging them</li>
</ul>
</section>
<section data-type="sect1" id="software-defined-intelligence-nrIRIWSn">
<h3>Software-Defined Intelligence</h3>
<p>Software-defined intelligence is taking the core logic from normally proprietary or specialized hardware (e.g. ASIC / FPGA) and doing it in software on commodity hardware. For Nutanix, we take the traditional storage logic (e.g. RAID, deduplication, compression, etc.) and put that into software that runs in each of the Nutanix Controller VMs (CVM) on standard x86 hardware. What it really means:</p>
<ul>
<li>Pulling key logic from hardware and doing it in software on commodity hardware</li>
</ul>
<p>Benefits include:</p>
<ul>
<li>Rapid release cycles</li>
<li>Elimination of proprietary hardware reliance</li>
<li>Utilization of commodity hardware for better economics</li>
</ul>
</section>
<section data-type="sect1" id="distributed-autonomous-systems-b1IeU4Sb">
<h3>Distributed Autonomous Systems</h3>
<p>Distributed autonomous systems involve moving away from the traditional concept of having a single unit responsible for doing something and distributing that role among all nodes within the cluster. You can think of this as creating a purely distributed system. Traditionally, vendors have assumed that hardware will be reliable, which, in most cases can be true. However, core to distributed systems is the idea that hardware will eventually fail and handling that fault in an elegant and non-disruptive way is key.</p>
<p>These distributed systems are designed to accommodate and remediate failure, to form something that is self-healing and autonomous. In the event of a component failure, the system will transparently handle and remediate the failure, continuing to operate as expected. Alerting will make the user aware, but rather than being a critical time-sensitive item, any remediation (e.g. replace a failed node) can be done on the admin’s schedule. Another way to put it is fail in-place (rebuild without replace) For items where a “master” is needed an election process is utilized, in the event this master fails a new master is elected. To distribute the processing of tasks MapReduce concepts are leveraged. What it really means:</p>
<ul>
<li>Distributing roles and responsibilities to all nodes within the system</li>
<li>Utilizing concepts like MapReduce to perform distributed processing of tasks</li>
<li>Using an election process in the case where a “master” is needed</li>
</ul>
<p>Benefits include:</p>
<ul>
<li>Eliminates any single points of failure (SPOF)</li>
<li>Distributes workload to eliminate any bottlenecks</li>
</ul>
</section>
<section data-type="sect1" id="incremental-and-linear-scale-out-rkIZhxSN">
<h3>Incremental and linear scale out</h3>
<p>Incremental and linear scale out relates to the ability to start with a certain set of resources and as needed scale them out while linearly increasing the performance of the system. All of the constructs mentioned above are critical enablers in making this a reality. For example, traditionally you’d have 3-layers of components for running virtual workloads: servers, storage, and network – all of which are scaled independently. As an example, when you scale out the number of servers you’re not scaling out your storage performance. With a hyper-converged platform like Nutanix, when you scale out with new node(s) you’re scaling out:</p>
<ul>
<li>The number of hypervisor / compute nodes</li>
<li>The number of storage controllers</li>
<li>The compute and storage performance / capacity</li>
<li>The number of nodes participating in cluster wide operations</li>
</ul>
<p>What it really means:</p>
<ul>
<li>The ability to incrementally scale storage / compute with linear increases to performance / ability</li>
</ul>
<p>Benefits include:</p>
<ul>
<li>The ability to start small and scale</li>
<li>Uniform and consistent performance at any scale</li>
</ul>
</section>
</section>
<section data-type="sect1" id="making-sense-of-it-all-22IWHQ">
<h3>Making Sense of It All</h3>
<p>In summary:</p>
<ol>
<li>Inefficient compute utilization led to the move to virtualization</li>
<li>Features including vMotion, HA, and DRS led to the requirement of centralized storage</li>
<li>VM sprawl led to the increase load and contention on storage</li>
<li>SSDs came in to alleviate the issues but changed the bottleneck to the network / controllers</li>
<li>Cache / memory accesses over the network face large overheads, minimizing their benefits</li>
<li>Array configuration complexity still remains the same</li>
<li>Server side caches were introduced to alleviate the load on the array / impact of the network, however introduces another component to the solution</li>
<li>Locality helps alleviate the bottlenecks / overheads traditionally faced when going over the network</li>
<li>Shifts the focus from infrastructure to ease of management and simplifying the stack</li>
<li>The birth of the Web-Scale world!</li>
</ol>
</section>
</div>
<div data-type="part" id="book-of-prism-7gEiv">
<h1><span class="label">Part II. </span>Book of Prism</h1>
<p class="definition"><strong>prism - /'prizɘm/ - noun - control plane</strong>
<br>
one-click management and interface for datacenter operations.</p>
<section data-type="chapter" id="design-methodology-and-iterations-13IRuV">
<h2>Design Methodology and Iterations</h2>
<p>
Building a beautiful, empathetic and intuitive product are core to the Nutanix platform and something we take very seriously. This section will cover our design methodology and how we iterate on them. More coming here soon!
</p>
<p>
In the meantime feel free to check out this great post on our design methodology and iterations by our Product Design Lead, Jeremy Sallee (who also designed this) - <a href="http://salleedesign.com/stuff/sdwip/blog/nutanix-case-study/">http://salleedesign.com/stuff/sdwip/blog/nutanix-case-study/</a>
</p>
<p>
You can download the Nutanix Visio stencils here: <a href="http://www.visiocafe.com/nutanix.htm">http://www.visiocafe.com/nutanix.htm</a>
</p>
</section>
<section data-type="chapter" id="architecture-NYIVT0">
<h2>Architecture</h2>
<p>Prism is a distributed resource management platform which allows users to manage and monitor objects and services across their Nutanix environment.</p>
<p>These capabilities are broken down into two key categories:</p>
<ul>
<li>Interfaces
<ul>
<li>HTML5 UI, REST API, CLI, PowerShell CMDlets, etc.</li>
</ul>
</li>
<li>Management
<ul>
<li>Policy definition and compliance, service design and status, analytics and monitoring</li>
</ul>
</li>
</ul>
<p>The figure highlights an image illustrating the conceptual nature of Prism as part of the Nutanix platform:</p>
<figure id="id-XWtxHVTW"><img alt="High-Level Prism Architecture" class="iimagesv2arch_prismpng" src="imagesv2/arch_prism.png">
<figcaption><span class="label">Figure 5-1. </span>High-Level Prism Architecture</figcaption>
</figure>
<p>Prism is broken down into two main components:</p>
<ul>
<li>Prism Central (PC)
<ul>
<li>Multi-cluster manager responsible for managing multiple Acropolis Clusters to provide a single, centralized management interface. Prism Central is an optional software appliance (VM) which can be deployed in addition to the Acropolis Cluster (can run on it).</li>
<li>1-to-many cluster manager</li>
</ul>
</li>
<li>Prism Element (PE)
<ul>
<li>Localized cluster manager responsible for local cluster management and operations. Every Acropolis Cluster has Prism Element built-in.</li>
<li>1-to-1 cluster manager</li>
</ul>
</li>
</ul>
<p>The figure shows an image illustrating the conceptual relationship between Prism Central and Prism Element:</p>
<figure id="id-zmt2i4Tx"><img alt="Prism Architecture" class="iimagesv2prism_arch2png" src="imagesv2/prism_arch2.png">
<figcaption><span class="label">Figure 5-2. </span>Prism Architecture</figcaption>
</figure>
<div data-type="note" class="note" id="pro-tip-05i5cRT9"><h6>Note</h6>
<h5>Pro tip</h5>
<p>For larger or distributed deployments (e.g. more than one cluster or multiple sites) it is recommended to use Prism Central to simplify operations and provide a single management UI for all clusters / sites.</p>
</div>
<h3>Prism Services</h3>
<p>A Prism service runs on every CVM with an elected Prism Leader which is responsible for handling HTTP requests. Similar to other components which have a Master, if the Prism Leader fails, a new one will be elected. When a CVM which is not the Prism Leader gets a HTTP request it will permanently redirect the request to the current Prism Leader using HTTP response status code 301.</p>
<p>Here we show a conceptual view of the Prism services and how HTTP request(s) are handled:</p>
<figure id="id-DktNCvTn"><img alt="Prism Services - Request Handling" class="iimagesv2prism_services3png" src="imagesv2/prism_services3.png">
<figcaption><span class="label">Figure 5-3. </span>Prism Services - Request Handling</figcaption>
</figure>
<div data-type="note" class="note" id="prism-ports-53iysDTA"><h6>Note</h6>
<h3>Prism ports</h3>
<p>Prism listens on ports 80 and 9440, if HTTP traffic comes in on port 80 it is redirected to HTTPS on port 9440.</p>
</div>
<p>When using the cluster external IP (recommended), it will always be hosted by the current Prism Leader. In the event of a Prism Leader failure the cluster IP will be assumed by the newly elected Prism Leader and a gratuitous ARP (gARP) will be used to clean any stale ARP cache entries. In this scenario any time the cluster IP is used to access Prism, no redirection is necessary as that will already be the Prism Leader.</p>
<div data-type="note" class="note" id="pro-tip-1RiATNTX"><h6>Note</h6>
<h5>Pro tip</h5>
<p>You can determine the current Prism leader by running 'curl localhost:2019/prism/leader' on any CVM.</p>
</div>
</section>
<section data-type="chapter" id="navigation-22IzS5">
<h2>Navigation</h2>
<p>Prism is fairly straight forward and simple to use, however we'll cover some of the main pages and basic usage.</p>
<p>Prism Central (if deployed) can be accessed using the IP address specified during configuration or corresponding DNS entry. Prism Element can be accessed via Prism Central (by clicking on a specific cluster) or by navigating to any Nutanix CVM or cluster IP (preferred).</p>
<p>Once the page has been loaded you will be greeted with the Login page where you will use your Prism or Active Directory credentials to login.</p>
<figure id="id-XWtpS0SW"><img alt="" class="iimagesv2prismprism_loginpng" src="imagesv2/Prism/prism_login.png">
<figcaption><span class="label">Figure 6-1. </span>Prism Login Page</figcaption>
</figure>
<p>Upon successful login you will be sent to the dashboard page which will provide overview information for managed cluster(s) in Prism Central or the local cluster in Prism Element.</p>
<p>Prism Central and Prism Element will be covered in more detail in the following sections.</p>
<section data-type="sect1" id="prism-central-4aI3tqSe">
<h3>Prism Central</h3>
<p>Prism Central contains the following main pages:</p>
<ul>
<li>Home Page
<ul>
<li>Environment wide monitoring dashboard including detailed information on service status, capacity planning, performance, tasks, etc. To get further information on any of them you can click on the item of interest.</li>
</ul>
</li>
<li>Explore Page
<ul>
<li>Management and monitoring of services, cluster, VMs and hosts</li>
</ul>
</li>
<li>Analysis Page
<ul>
<li>Detailed performance analysis for cluster and managed objects with event correlation</li>
</ul>
</li>
<li>Alerts
<ul>
<li>Environment wide alerts</li>
</ul>
</li>
</ul>
<p>The figure shows a sample Prism Central dashboard where multiple clusters can be monitored / managed:</p>
<figure class="large" id="id-0OtpSxt2SW"><img alt="Prism Central - Dashboard" class="iimagesv2prismpc_dashboard2png" src="imagesv2/Prism/PC_dashboard2.png">
<figcaption><span class="label">Figure 6-2. </span>Prism Central - Dashboard</figcaption>
</figure>
<p>From here you can monitor the overall status of your environment, and dive deeper if there are any alerts or items of interest.</p>
<div data-type="note" class="note" id="pro-tip-G9iVFptNS9"><h6>Note</h6>
<h5>Pro tip</h5>
<p>If everything is green, go back to doing something else :)</p>
</div>
</section>
<section data-type="sect1" id="prism-element-BNIWfdSz">
<h3>Prism Element</h3>
<p>Prism Element contains the following main pages:</p>
<ul>
<li>Home Page
<ul>
<li>Local cluster monitoring dashboard including detailed information on alerts, capacity, performance, health, tasks, etc. To get further information on any of them you can click on the item of interest.</li>
</ul>
</li>
<li>Health Page
<ul>
<li>Environment, hardware and managed object health and state information. Includes NCC health check status as well.</li>
</ul>
</li>
<li>VM Page
<ul>
<li>Full VM management, monitoring and CRUD (Acropolis)</li>
<li>VM monitoring (non-Acropolis)</li>
</ul>
</li>
<li>Storage Page
<ul>
<li>Container management, monitoring and CRUD</li>
</ul>
</li>
<li>Hardware
<ul>
<li>Server, disk and network management, monitoring and health. Includes cluster expansion as well as node and disk removal.</li>
</ul>
</li>
<li>Data Protection
<ul>
<li>DR, Cloud Connect and Metro Availability configuration. Management of PD objects, snapshots, replication and restore.</li>
</ul>
</li>
<li>Analysis
<ul>
<li>Detailed performance analysis for cluster and managed objects with event correlation</li>
</ul>
</li>
<li>Alerts
<ul>
<li>Local cluster and environment alerts</li>
</ul>
</li>
</ul>
<p>The home page will provide detailed information on alerts, service status, capacity, performance, tasks, and much more. To get further information on any of them you can click on the item of interest.</p>
<p>The figure shows a sample Prism Element dashboard where local cluster details are displayed:</p>
<figure class="large" id="id-DktkHEfaSr"><img alt="Prism Element - Dashboard" class="iimagesv2prismpe_dashboardpng" src="imagesv2/Prism/PE_dashboard.png">
<figcaption><span class="label">Figure 6-3. </span>Prism Element - Dashboard</figcaption>
</figure>
<div data-type="note" class="note" id="keyboard-shortcuts-53i0FXfDS9"><h6>Note</h6>
<h3>Keyboard Shortcuts</h3>
<p>Accessibility and ease of use is a very critical construct in Prism. To simplify things for the end-user a set of shortcuts have been added to allow users to do everything from their keyboard.</p>
<p>The following characterizes some of the key shortcuts:</p>
<p>Change view (page context aware):</p>
<ul>
<li>O - Overview View</li>
<li>D - Diagram View</li>
<li>T - Table View</li>
</ul>
<p>Activities and Events:</p>
<ul>
<li>A - Alerts</li>
<li>P - Tasks</li>
</ul>