search.xml

<?xml version="1.0" encoding="utf-8"?>
<search> 
  
  
    
    <entry>
      <title>AWS - Redshift</title>
      <link href="2020/12/31/markdown/AWS/AWS2021/redshift/"/>
      <url>2020/12/31/markdown/AWS/AWS2021/redshift/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc</a></p><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h2><ul><li>massively parallel, share Nothing Columnar architecture</li></ul><h2 id="best-practices-encoding-compression"><a class="markdownIt-Anchor" href="#best-practices-encoding-compression"></a> Best Practices: Encoding &amp; Compression</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=657" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=657</a></p><ul><li>Use AZt4</li></ul><h3 id="basics"><a class="markdownIt-Anchor" href="#basics"></a> Basics</h3><ul><li>blocks (1MB immutable block encoded with 1 encoding)</li><li>zone maps</li><li>sort key</li></ul><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=787" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=787</a></p><h2 id="best-practices-sort-keys"><a class="markdownIt-Anchor" href="#best-practices-sort-keys"></a> Best Practices: Sort Keys</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=941" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=941</a></p><ul><li>Compound key: Lowest cardinality columns first</li><li>Use script to help find the sort key</li><li>Define sort key on large table, four or less columns</li></ul><h2 id="best-practice-materialize-columns"><a class="markdownIt-Anchor" href="#best-practice-materialize-columns"></a> Best Practice: Materialize columns</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1001" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1001</a></p><ul><li>Frequently filtered and unchanging dimension values should be materialized within fact tables;</li></ul><h2 id="basics-slice-data-distribution"><a class="markdownIt-Anchor" href="#basics-slice-data-distribution"></a> Basics: Slice, Data Distribution</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1114" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1114</a></p><h2 id="best-practices-table-design-summary"><a class="markdownIt-Anchor" href="#best-practices-table-design-summary"></a> Best practices: table design summary</h2><blockquote></blockquote><p><a href="https://youtu.be/lj8oaSpCFTc?t=1455" target="_blank" rel="noopener">https://youtu.be/lj8oaSpCFTc?t=1455</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Redshift </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Reference Case, API First</title>
      <link href="2020/12/29/markdown/AWS/AWS2021/DataAnalytics_Airflow/"/>
      <url>2020/12/29/markdown/AWS/AWS2021/DataAnalytics_Airflow/</url>
      
        <content type="html"><![CDATA[<p><a href="https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/</a></p><p><a href="https://aws.amazon.com/blogs/containers/how-affirm-uses-aws-fargate-and-apache-airflow-to-manage-batch-jobs/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/containers/how-affirm-uses-aws-fargate-and-apache-airflow-to-manage-batch-jobs/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> Airflow </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - BlogList</title>
      <link href="2020/12/29/markdown/AWS/AWS2021/awsblog-index/"/>
      <url>2020/12/29/markdown/AWS/AWS2021/awsblog-index/</url>
      
        <content type="html"><![CDATA[<blockquote></blockquote><p><a href="https://aws.amazon.com/blogs/big-data/accessing-and-visualizing-external-tables-in-an-apache-hive-metastore-with-amazon-athena-and-amazon-quicksight/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/accessing-and-visualizing-external-tables-in-an-apache-hive-metastore-with-amazon-athena-and-amazon-quicksight/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/setting-up-automated-data-quality-workflows-and-alerts-using-aws-glue-databrew-and-aws-lambda/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/setting-up-automated-data-quality-workflows-and-alerts-using-aws-glue-databrew-and-aws-lambda/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/amazon-emr-studio-preview-a-new-notebook-first-ide-experience-with-amazon-emr/</a></p><p><a href="https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-by-running-amazon-emr-notebooks-programmatically/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-by-running-amazon-emr-notebooks-programmatically/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Blog </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - HPC</title>
      <link href="2020/08/05/markdown/AWS/AWS2020/Solution_HPC/"/>
      <url>2020/08/05/markdown/AWS/AWS2020/Solution_HPC/</url>
      
        <content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><ul><li>AWS re:Invent 2019: [REPEAT 1] HPC on AWS: Innovating without infrastructure constraints (CMP204-R1)</li></ul><blockquote><p><a href="https://youtu.be/g70bvcGlPY4" target="_blank" rel="noopener">https://youtu.be/g70bvcGlPY4</a></p></blockquote><ul><li>AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cloud( CMP318 )</li></ul><blockquote><p><a href="https://youtu.be/x7M3m1jZ7L8" target="_blank" rel="noopener">https://youtu.be/x7M3m1jZ7L8</a></p></blockquote><ul><li><a href="https://youtu.be/0bGZdqx6w1Q" target="_blank" rel="noopener">https://youtu.be/0bGZdqx6w1Q</a></li><li><a href="https://youtu.be/tHylCR0NIwU" target="_blank" rel="noopener">https://youtu.be/tHylCR0NIwU</a></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> HPC </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - KMS</title>
      <link href="2020/08/03/markdown/AWS/AWS2020/Security_KMS/"/>
      <url>2020/08/03/markdown/AWS/AWS2020/Security_KMS/</url>
      
        <content type="html"><![CDATA[<ul><li>If you want to use AWS managed keys, then you can’t control key rotation, it would be every 3 years.</li><li>If you want to use Customer Managed Keys (CMK), you can turn on automatic rotation for sysmetric keys, it would be every year.</li><li>CMK sysmetric key and asysmetric private key never left KMS unencrypted</li><li>How to choose from Sysmetric and Asysmetric key</li></ul><blockquote><p><a href="https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-choose.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-choose.html</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> KMS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>LoraWAN</title>
      <link href="2020/07/18/markdown/AWS/AWS2020/LoraWAN/"/>
      <url>2020/07/18/markdown/AWS/AWS2020/LoraWAN/</url>
      
        <content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><p><a href="https://youtu.be/8Oxcp9wQQnk" target="_blank" rel="noopener">https://youtu.be/8Oxcp9wQQnk</a></p><h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><h2 id="lora-vs-lorawan"><a class="markdownIt-Anchor" href="#lora-vs-lorawan"></a> Lora vs LoraWAN</h2><ul><li>Lora is the protocol, __Lo__ng __Ra__nge ; LoRa is Layer2</li><li>LoraWAN is the IoT solution based on Lora technology</li></ul><h2 id="lora-procons"><a class="markdownIt-Anchor" href="#lora-procons"></a> Lora Pro/Cons</h2><ul><li>ISM Open frequency(415,868,915MHz, free ; no license required</li><li>Interference ; low data rate</li></ul><h2 id="limitations-parameters"><a class="markdownIt-Anchor" href="#limitations-parameters"></a> Limitations / Parameters</h2><p>Target: transmission message about 10 km and the battery last for 2 years.</p><ul><li>Frequency: Pay attention to band requirement per country</li><li>Tx power (transmission power): 2-14 dbm / 5-20 dBm; the higher the power , the longer distance signals can cover</li><li>Bandwidth (125/250/500 KHz): the higher the more data can be include in one transmission; the higher the bandwidth, the shorter battery life, the shorter range and more interference.(??); checked the local laws</li><li>spreading factor: (7-12), the larger spreading factor, the longer distance and shorter battery life.</li><li>coding rate: 4/5, 4/6, 4/7, 4/8,<br>4/5 means 5 error bits used to correct 4 bit of data. The more coding rate, means your data can transfer longer distance but lower battery life.</li></ul><h2 id="lora-device"><a class="markdownIt-Anchor" href="#lora-device"></a> LoRa Device</h2><ul><li><p>Lora Nodes:<br>Normally will integrate sensor, transponder, mircrocontroler all together.<br>Receive and transmit sensor data, send out via air using LoRa protocol</p><ul><li>LoPy ; LORA GPS Hat; RN2483</li></ul></li><li><p>Gateway :<br>Receive LoRa data via multi channels with different frequencies ; send out data to IP network.</p><ul><li>IMST IC880A-SPI (8 channels at a time)</li></ul></li></ul><h2 id="lorawan"><a class="markdownIt-Anchor" href="#lorawan"></a> LoRaWAN</h2><p>Layer 3 and 4 ;</p>]]></content>
      
      
      
        <tags>
            
            <tag> IoT </tag>
            
            <tag> LoraWAN </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Security</title>
      <link href="2020/06/16/markdown/AWS/AWS2020/WAR_Security/"/>
      <url>2020/06/16/markdown/AWS/AWS2020/WAR_Security/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/u6BCVkXkPnM" target="_blank" rel="noopener">https://youtu.be/u6BCVkXkPnM</a></p><h1 id="aws-reinforce-2019-security-best-practices-the-well-architected-way-sdd318"><a class="markdownIt-Anchor" href="#aws-reinforce-2019-security-best-practices-the-well-architected-way-sdd318"></a> AWS re:Inforce 2019: Security Best Practices the Well-Architected Way (SDD318)</h1><h2 id="incident-response"><a class="markdownIt-Anchor" href="#incident-response"></a> Incident response</h2><p><a href="https://d1.awsstatic.com/whitepapers/aws_security_incident_response.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/aws_security_incident_response.pdf</a></p><p>Playbook vs Runbook: run book have more details</p><p><a href="https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_finding-types-active.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_finding-types-active.html</a></p><ul><li>predefined query against cloudwatch event</li></ul><h2 id="iam"><a class="markdownIt-Anchor" href="#iam"></a> IAM</h2><ul><li><p>SSO<br><a href="https://aws.amazon.com/blogs/security/how-to-establish-federated-access-to-your-aws-resources-by-using-active-directory-user-attributes/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/security/how-to-establish-federated-access-to-your-aws-resources-by-using-active-directory-user-attributes/</a></p></li><li><p>Permission boundaries</p></li><li><p>Automation</p></li><li><p>Role from Account 1 to assume role from Account 2 (hands on)</p></li></ul><h2 id="management"><a class="markdownIt-Anchor" href="#management"></a> Management</h2><p>Detective Control</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Security </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Reference Case, API First</title>
      <link href="2020/02/23/markdown/AWS/AWS2020/APIFirst/"/>
      <url>2020/02/23/markdown/AWS/AWS2020/APIFirst/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/TKgml4bSiZA" target="_blank" rel="noopener">https://youtu.be/TKgml4bSiZA</a></p><h2 id="key-take-away"><a class="markdownIt-Anchor" href="#key-take-away"></a> Key Take Away</h2><ul><li>No IT / Business separation</li><li>Cross functional teams</li><li>Born agile (DevOps)</li><li>TDD , automation and ChatOps</li><li>Customer-centric design</li><li>CD</li></ul><h2 id="archi"><a class="markdownIt-Anchor" href="#archi"></a> Archi</h2><h1 id="reference-openbanking-with-hsbc"><a class="markdownIt-Anchor" href="#reference-openbanking-with-hsbc"></a> Reference Openbanking with HSBC</h1><blockquote></blockquote><p><a href="https://youtu.be/QNM9LVV_eI0" target="_blank" rel="noopener">https://youtu.be/QNM9LVV_eI0</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> API First </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Protection Ring</title>
      <link href="2020/01/24/markdown/BackToBasic/Security/ProtectionRing/"/>
      <url>2020/01/24/markdown/BackToBasic/Security/ProtectionRing/</url>
      
        <content type="html"><![CDATA[<p><a href="https://en.wikipedia.org/wiki/Protection_ring" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/Protection_ring</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> Security </tag>
            
            <tag> Protection Rings </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Kinesis</title>
      <link href="2020/01/18/markdown/AWS/AWS2021/Kinesis/"/>
      <url>2020/01/18/markdown/AWS/AWS2021/Kinesis/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/jKPlGznbfZ0" target="_blank" rel="noopener">https://youtu.be/jKPlGznbfZ0</a></p><h2 id="why-streaming"><a class="markdownIt-Anchor" href="#why-streaming"></a> Why Streaming</h2><ul><li>Data loses value quickly over time<ul><li>“Time critical decisions” need streaming data</li><li>inject as it’s generated, process on the fly and do real-time analytics/ML/Alert/Action</li></ul></li><li>Common streaming use case<ul><li>Smart home / automation / log / Data Lake / IoT</li></ul></li><li>Real time analytics demo (User Dashboard)</li></ul><h2 id="streams-producers-and-consumers"><a class="markdownIt-Anchor" href="#streams-producers-and-consumers"></a> Streams Producers and Consumers</h2><h3 id="producer-limits"><a class="markdownIt-Anchor" href="#producer-limits"></a> Producer limits</h3><ul><li>bandwidth limitation: 1MB/sec/shard</li><li>if not, aggregate your message, and use throughput limitation: 1k record/sec/shard</li></ul><h3 id="normal-consumer"><a class="markdownIt-Anchor" href="#normal-consumer"></a> Normal consumer</h3><ul><li><p>The slowest consumer will also impact number of shards, you might need increase the shards to allow the slowest consumer can process the message concurrently to pick up all the messages</p></li><li><p>The fastest speed you can get the data is one trasaction per 200ms</p></li><li><p>Multiple consumers share the 5 transaction/sec/shard and 1M data / sec /shard limitations.</p><ul><li>Multiple consumers will decrease the troughput as well as increase your latency</li></ul></li><li><p>Workaround , use master stream and copied slave stream</p></li></ul><h3 id="enhanced-fan-out"><a class="markdownIt-Anchor" href="#enhanced-fan-out"></a> Enhanced Fan out</h3><ul><li>use http/2 , subscribe, and data is pushed to consumer</li><li>each consumer gets dedicated 2MB/sec/shard ; message latency can be 15ms</li></ul><h2 id="comcast-streaming"><a class="markdownIt-Anchor" href="#comcast-streaming"></a> Comcast streaming</h2><ul><li>As a platform, design topic for teams to share/communication</li><li>Use API gateway to register the stream</li></ul><p><a href="https://comcastsamples.github.io/KinesisShardCalculator/" target="_blank" rel="noopener">https://comcastsamples.github.io/KinesisShardCalculator/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Kinesis </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Design MQTT Topics for AWS IoT Core</title>
      <link href="2020/01/02/markdown/AWS/AWS2020/BestPracticesMQTT/"/>
      <url>2020/01/02/markdown/AWS/AWS2020/BestPracticesMQTT/</url>
      
        <content type="html"><![CDATA[<h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><p><a href="https://d1.awsstatic.com/whitepapers/Designing_MQTT_Topics_for_AWS_IoT_Core.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/Designing_MQTT_Topics_for_AWS_IoT_Core.pdf</a></p><h1 id="mqtt-communication-patterns"><a class="markdownIt-Anchor" href="#mqtt-communication-patterns"></a> MQTT Communication Patterns</h1><ul><li>Point to Point<ul><li>different devices subscribe to the topic relevant to itself</li></ul></li><li>Broadcast<ul><li>multiple devices subscribe to same topic</li></ul></li><li>Fan-in<ul><li>multiple devices publish to same topic</li><li>avoid using fan-in to a single end device (?); use fan-in to route a large fleet of messages via IoT Rules Engine.<ul><li>because this routing may hit a non-adjustable limit on a single device MQTT connection (!!!)</li></ul></li></ul></li></ul><h1 id="mqtt-communication-patterns-2"><a class="markdownIt-Anchor" href="#mqtt-communication-patterns-2"></a> MQTT Communication Patterns</h1><ul><li>device to device</li><li>device to cloud</li><li>cloud to device<ul><li>include session information for tracking purpose</li></ul></li><li>device to/from users</li></ul><h1 id="mqtt-design-best-practices"><a class="markdownIt-Anchor" href="#mqtt-design-best-practices"></a> MQTT Design Best Practices</h1><h2 id="general-best-practices"><a class="markdownIt-Anchor" href="#general-best-practices"></a> General Best Practices</h2><ul><li>topic level: lowercase letters, numbers and dashes</li><li>general to specific</li><li>include any relevant routing information in topic</li><li><strong>prefix</strong> to distinguish data and command topics</li><li>document topic structure as operation practices</li><li>use IoT Thing name as MQTT client ID – easy to correlate for logging and policy purpose</li><li>including Thing Name in any MQTT message published by a thing or sending to a specific thing</li><li>review the limitations<ul><li><a href="https://docs.aws.amazon.com/general/latest/gr/iot-core.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/general/latest/gr/iot-core.html</a></li></ul></li><li>include contextual information in payload messages</li><li>avoid fan-in to a single device – do not allow a single device subscribe to a shared topic (!!!)</li><li>never allow device to subscribe to all topics (#); Use single level wildcard (+) for IoT Rules</li></ul><h2 id="best-practices-for-telemetry"><a class="markdownIt-Anchor" href="#best-practices-for-telemetry"></a> Best Practices for Telemetry</h2><ul><li>IoT Basic Ingest for Telemetry<ul><li>topic is designed to help route to different rules in rule engine (no need for device-2-device)</li></ul></li><li>Traditional MQTT topics</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dt/&lt;application&gt;/&lt;context&gt;/&lt;thing-name&gt;/&lt;dt-type&gt;</span><br></pre></td></tr></table></figure><p><application>: useful for version switch<br><context>: grouping ; for example device group id<br><dt-type>: subcomponent of device / sensors</dt-type></context></application></p><h2 id="best-practices-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-commands"></a> Best Practices for Commands</h2><ul><li><p>IoT Shadow</p></li><li><p>AWS IoT Shadow is the preferred AWS IoT Service for implementing individual device<br>commands.</p></li><li><p>AWS IoT Device Jobs(?) should be used for fleet-wide operations as it<br>provides extra benefits, such as Amazon CloudWatch metrics for Job tracking, and the<br>ability to track multiple in-transit Jobs for a single device.</p></li><li><p>You can use a combination of<br>the AWS IoT Shadow, AWS IoT Job documents(?), and standard MQTT topics to support<br>your command use cases.</p></li></ul><h2 id="best-practices-for-using-the-aws-iot-shadow"><a class="markdownIt-Anchor" href="#best-practices-for-using-the-aws-iot-shadow"></a> Best Practices for Using the AWS IoT Shadow</h2><ul><li>Don’t share shadow</li><li>Shadow is for infrequent state or command happen in min/hour/day.</li><li>Use shadow for storing status metrics of device</li><li>Use shadow for firmware version (major.minor.patch)</li><li>use clientToken field for tracking purpose</li></ul><h2 id="best-practices-for-using-iot-jobs-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-using-iot-jobs-for-commands"></a> Best Practices for using IoT Jobs for commands</h2><p>IoT Job contains instructions that the thing must run to complete it’s tranction.</p><ul><li>Use thing group with AWS IoT Jobs<ul><li>update all things with certain firmware</li></ul></li><li>Use staged rollout using Device Jobs</li></ul><h2 id="best-practices-for-using-mqtt-topics-for-commands"><a class="markdownIt-Anchor" href="#best-practices-for-using-mqtt-topics-for-commands"></a> Best Practices for using MQTT Topics for commands</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cmd/&lt;application&gt;/&lt;context&gt;/&lt;destination-id&gt;/&lt;req-type&gt;</span><br><span class="line">cmd/&lt;application&gt;/&lt;context&gt;/&lt;destination-id&gt;/&lt;res-type&gt;</span><br></pre></td></tr></table></figure><ul><li>Command Payload Syntax<ul><li>session id</li><li>response-topic</li></ul></li></ul><h1 id="applications-on-aws"><a class="markdownIt-Anchor" href="#applications-on-aws"></a> Applications on AWS</h1>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> IoT </tag>
            
            <tag> MQTT </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - EC2</title>
      <link href="2019/09/17/markdown/AWS/AWS2019/EC2/"/>
      <url>2019/09/17/markdown/AWS/AWS2019/EC2/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/cb0KvqGjXRE" target="_blank" rel="noopener">https://youtu.be/cb0KvqGjXRE</a></p></blockquote><h2 id="ec2"><a class="markdownIt-Anchor" href="#ec2"></a> EC2</h2><ul><li>AWS’s vision of EC2 : compute platform for the world and keep innovation</li><li>EC2 new OS: Amazon Linux 2<ul><li>5 years support</li><li>You can use for on-premise</li></ul></li><li>EC2 support Windows<ul><li>Most windows on cloud runs on AWS</li></ul></li><li>BYO License use AWS License Manager</li><li>Specifically optimized for SAP</li></ul><h3 id="deep-dive"><a class="markdownIt-Anchor" href="#deep-dive"></a> Deep dive</h3><ul><li>AWS Nitro System, accelerate the hypervisor layer</li><li>AWS Firecracker, used by Lambda</li></ul><h2 id="serverless"><a class="markdownIt-Anchor" href="#serverless"></a> Serverless</h2><ul><li>Lambda are triggered trillions of times / month</li></ul><h2 id="storage"><a class="markdownIt-Anchor" href="#storage"></a> Storage</h2><ul><li>S3 Intelligent Tiering — Auto category the data</li><li>S3 Glacier Deep Archive — 70% cheaper than Glacier ; New product</li></ul><h2 id="hibernate-on-demand"><a class="markdownIt-Anchor" href="#hibernate-on-demand"></a> Hibernate On-demand</h2><h2 id="predictive-scaling-scale-for-you-to-more-cater-for-your-spike-need"><a class="markdownIt-Anchor" href="#predictive-scaling-scale-for-you-to-more-cater-for-your-spike-need"></a> Predictive Scaling : Scale for you to more cater for your spike need</h2><h2 id="reference-case"><a class="markdownIt-Anchor" href="#reference-case"></a> Reference Case</h2><ul><li>small company can compete with large studios by using AWS (Think Box)</li></ul><h2 id="hybrid"><a class="markdownIt-Anchor" href="#hybrid"></a> Hybrid</h2><ul><li>Outpost<ul><li>The compute capacity will show in your VPC</li><li>VM version and AWS version</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> EC2 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Handson Best Practice</title>
      <link href="2019/09/13/markdown/AWS/AWS2019/Handson_bestPractise/"/>
      <url>2019/09/13/markdown/AWS/AWS2019/Handson_bestPractise/</url>
      
        <content type="html"><![CDATA[<h1 id="cloudformation"><a class="markdownIt-Anchor" href="#cloudformation"></a> CloudFormation</h1><ul><li>Define Security Group separated with Server</li><li>Otherwise the Server Stack is not able to be deleted when Security Group is referenced by Other Servers</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Best Practice </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Keynotes</title>
      <link href="2019/08/27/markdown/AWS/AWS2019/Keynote/"/>
      <url>2019/08/27/markdown/AWS/AWS2019/Keynote/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/femopq3JWJg" target="_blank" rel="noopener">https://youtu.be/femopq3JWJg</a></p></blockquote><h1 id="redesign-of-the-db-architecture"><a class="markdownIt-Anchor" href="#redesign-of-the-db-architecture"></a> Redesign of the DB architecture</h1><h2 id="history-of-aurora"><a class="markdownIt-Anchor" href="#history-of-aurora"></a> History of Aurora</h2><ul><li>Cell based architectures<ul><li>Shared storage</li><li>Easy plus one failure mode</li></ul></li><li>The log is the database<ul><li>Across the AZ and shard, it’s the log that being moved, not the data.</li></ul></li><li>Change happens at storage layer, redesign to make the storage layer database awareness.</li></ul><h2 id="history-of-dynamodb"><a class="markdownIt-Anchor" href="#history-of-dynamodb"></a> History of DynamoDB</h2><ul><li>Analysis show that 70% query to relational DB is just key value.</li></ul><blockquote><p>DYNAMO</p></blockquote><ul><li>Feature<ul><li>automatic re-sharding</li><li>DB migration service (From Oracle to Dynamo)</li></ul></li></ul><h2 id="basic-knowledge-with-aurora-sharding"><a class="markdownIt-Anchor" href="#basic-knowledge-with-aurora-sharding"></a> Basic knowledge with Aurora sharding</h2><ul><li><p>3 quorums across 3 AZ is not enough, 6 quorums across 3 AZ</p><ul><li>V=6 (every data have 6 copies all together) means there would be 6 node for the same data, when writing, it needs at least more than 3 nodes being alive.</li></ul></li><li><p>When a db using sharding, and has v quorums (servers), we can calculate how many write nodes and read nodes we need by applying the rules.</p><ul><li>if V=6 (we have a cluster of 6 servers) ; (V/2)=3, Vw&gt;3, so Vw=4 ; 4 nodes will be write consistent; Vw+Vr&gt;V, 4+?&gt;6, so Vr=3</li></ul><blockquote><p>Vw + Vr &gt; V<br>Vw&gt;V/2</p></blockquote></li></ul><h1 id="data-lake"><a class="markdownIt-Anchor" href="#data-lake"></a> Data Lake</h1><ul><li>S3 manage 60 terabit /sec in one region</li><li>Culture of durability</li><li>11 9s : Time to Fail and Time to Repair</li></ul><h1 id="1-nov-2018-worlds-largest-oracle-dw-to-redshift"><a class="markdownIt-Anchor" href="#1-nov-2018-worlds-largest-oracle-dw-to-redshift"></a> 1 Nov 2018 - World’s largest Oracle DW to Redshift</h1><ul><li>Redshift concurrency scaling<ul><li>consistently fast with thousands of concurrent queries.</li></ul></li></ul><h1 id="demo-fender-music"><a class="markdownIt-Anchor" href="#demo-fender-music"></a> Demo - Fender Music</h1><h1 id="serverless"><a class="markdownIt-Anchor" href="#serverless"></a> Serverless</h1><ul><li>Lambda handles trillions of request per month</li><li>Random spread the work load to multi servers</li><li>Lambda Layers (share binaries between different lambda functions)</li><li>Nested Application with Lambda</li></ul><h1 id="stepfunction"><a class="markdownIt-Anchor" href="#stepfunction"></a> StepFunction</h1><ul><li>Services that can be orchistrated by StepFunctions<ul><li>Batch , ECS, Fargate, Glue, DynamoDB, SNS, SQS, SageMaker</li></ul></li></ul><h1 id="api-gateway"><a class="markdownIt-Anchor" href="#api-gateway"></a> API Gateway</h1><ul><li>Websocket support for API Gateway</li><li>Move things from EC2 to serveless without change the API</li></ul><h2 id="kinesis-and-managed-streaming-for-kafka"><a class="markdownIt-Anchor" href="#kinesis-and-managed-streaming-for-kafka"></a> Kinesis and Managed Streaming for Kafka</h2><ul><li>Video and audio becoming streaming data</li><li>Kinesis Family</li></ul><h1 id="demo-nab"><a class="markdownIt-Anchor" href="#demo-nab"></a> Demo - NAB</h1><ul><li>Culture, you build it, you fix it, — craftsmanship</li><li>35% application in cloud by 2020</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Gazebo</title>
      <link href="2019/08/16/markdown/Trending/Robotics/Robotics/"/>
      <url>2019/08/16/markdown/Trending/Robotics/Robotics/</url>
      
        <content type="html"><![CDATA[<h1 id="gazebo"><a class="markdownIt-Anchor" href="#gazebo"></a> Gazebo</h1><p><a href="http://gazebosim.org/" target="_blank" rel="noopener">http://gazebosim.org/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> Robotics </tag>
            
            <tag> Gazebo </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudMap</title>
      <link href="2019/08/15/markdown/AWS/AWS2019/CloudMap/"/>
      <url>2019/08/15/markdown/AWS/AWS2019/CloudMap/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/fMGd9IUaotE" target="_blank" rel="noopener">https://youtu.be/fMGd9IUaotE</a></p><h2 id="service-registers"><a class="markdownIt-Anchor" href="#service-registers"></a> Service registers</h2><ul><li>Zookeeper , Eureka, SmartStack, SkyDns, Doozerd, etcd, etc</li><li>CloudMap : dynamic map of your cloud</li></ul><h1 id="issue-try-to-solve"><a class="markdownIt-Anchor" href="#issue-try-to-solve"></a> Issue try to solve</h1><ul><li><p>Attribute based service discovery under complex service environment</p><ul><li>Multiple Stage</li><li>Multiple Version</li><li>Multiple Status</li></ul></li><li><p>Handle partial failure</p><ul><li>help you provision Route53 to help handle partial failure</li></ul></li></ul><h1 id="integrate-with-existing-aws-service"><a class="markdownIt-Anchor" href="#integrate-with-existing-aws-service"></a> Integrate with existing AWS service</h1><ul><li>Cloudformation</li><li>IAM</li></ul><h2 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">dig +short A backend.cloudmapdemo.com</span><br><span class="line">172.31.1.228</span><br><span class="line">172.31.0.100</span><br></pre></td></tr></table></figure><h2 id="work-with-consul"><a class="markdownIt-Anchor" href="#work-with-consul"></a> Work with Consul</h2><ul><li>AWS Cloud Map and Consul to extend hybrid infra to multi-region</li></ul><blockquote></blockquote><p><a href="https://www.youtube.com/watch?v=fMGd9IUaotE&amp;list=PL72BC_ThTrzW0wfjYWsPIG-sRb920Ubs3&amp;index=24&amp;t=614s" target="_blank" rel="noopener">https://www.youtube.com/watch?v=fMGd9IUaotE&amp;list=PL72BC_ThTrzW0wfjYWsPIG-sRb920Ubs3&amp;index=24&amp;t=614s</a></p><h2 id="feeling"><a class="markdownIt-Anchor" href="#feeling"></a> Feeling</h2><ul><li>An solution based on Route53.</li><li>A service discovery service.</li><li>provide namespace and service name, it will provide a list of service endpoints.</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CloudMap </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - DotNet</title>
      <link href="2019/08/15/markdown/AWS/AWS2019/DotNet/"/>
      <url>2019/08/15/markdown/AWS/AWS2019/DotNet/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/FteCJQcTDc4" target="_blank" rel="noopener">https://youtu.be/FteCJQcTDc4</a></p><h2 id="modern-net-applications-on-aws"><a class="markdownIt-Anchor" href="#modern-net-applications-on-aws"></a> Modern .NET applications on AWS</h2><p>Mosaic image</p><ul><li>Service being used: .net tool, lambda, xray, ecr fargate, dynamodb, cognito, s3, code pipeline, sqs, stepfunction, aws batch, ssm param, cloudformation</li></ul><h3 id="demo-use-visual-studio-to-cicd"><a class="markdownIt-Anchor" href="#demo-use-visual-studio-to-cicd"></a> Demo : use visual studio to CICD</h3><ul><li>AWS Batch<ul><li>Work as queue; ability to use EC2 Spot Instances</li></ul></li><li>Use Visual Studio, you can directly publish the code to generate Docker image and publish to AWS ECR<ul><li>The code logic is to download the pic, and upload to corresponding S3 Raw folder</li></ul></li><li>Use Visual Studio, directly publish Lambda function<ul><li>In lambda , register XRay will enable XRay drill down details of the invoke</li></ul></li><li>Code Pipeline</li><li>Use step function to link all the functions</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> DotNet </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - MachineLearning</title>
      <link href="2019/08/14/markdown/AWS/AWS2019/MachineLearning/"/>
      <url>2019/08/14/markdown/AWS/AWS2019/MachineLearning/</url>
      
        <content type="html"><![CDATA[<h1 id="hands-on"><a class="markdownIt-Anchor" href="#hands-on"></a> Hands-on</h1><p><a href="https://s3.amazonaws.com/solutions-reference/predictive-maintenance-using-machine-learning/latest/predictive-maintenance-using-machine-learning.pdf" target="_blank" rel="noopener">https://s3.amazonaws.com/solutions-reference/predictive-maintenance-using-machine-learning/latest/predictive-maintenance-using-machine-learning.pdf</a></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Couldn&apos;t call &apos;describe_notebook_instance&apos; to get the Role ARN of the instance PredictiveMaintenanceNotebookInstance.</span><br></pre></td></tr></table></figure><p>Update the role attached to the sagemaker instance</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ResourceLimitExceeded</span><br></pre></td></tr></table></figure><p>Change to train_instance_type = ‘ml.p2.xlarge’</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/GW0Bktm55nI" target="_blank" rel="noopener">https://youtu.be/GW0Bktm55nI</a></p><h2 id="aws-machine-learning-stack"><a class="markdownIt-Anchor" href="#aws-machine-learning-stack"></a> AWS Machine Learning Stack</h2><ul><li>ML Frameworks &amp; Infrastructures<ul><li>Frameworks<ul><li>Tensorflow ( 85% TensorFlow workloads in cloud runs on AWS)</li><li>Apache Mxnet – Deep learning for Enterprise dev ; liner scalable</li><li>Pytorch – Facebook ; flexible , versatile and portable</li><li>AWS is framework agnostic</li></ul></li></ul></li><li>ML Services<ul><li>SageMaker workflows</li><li>SageMaker Ground Truth</li><li>Use SageMaker to do Re-enforced ML : For example Vehicle routing</li><li>Sagemaker Neo (Opensource)<ul><li>Accelerate the cycle of doing Machine learning</li><li>CICD</li><li>Optimize between different frameworks</li></ul></li></ul></li><li>AI Services<ul><li>Textract</li></ul></li></ul><h2 id="ge-healthcare-demo"><a class="markdownIt-Anchor" href="#ge-healthcare-demo"></a> GE Healthcare Demo</h2><ul><li>Neural Network Compression : reduce layer and retrain the model</li><li>How to archive network compression using AWS service<ul><li>Use SageMaker RL<ul><li>State current network archi</li><li>Action : remove layer or not</li><li>Reward : Accuracy + compression ratio</li></ul></li><li>Result : 40% smaller model and 1%-2% loss of accuracy</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Machine Learning </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Encryption</title>
      <link href="2019/08/14/markdown/AWS/AWS2019/Security/"/>
      <url>2019/08/14/markdown/AWS/AWS2019/Security/</url>
      
        <content type="html"><![CDATA[<h1 id="reference-s3-sse-kms"><a class="markdownIt-Anchor" href="#reference-s3-sse-kms"></a> Reference - S3 SSE-KMS</h1><blockquote><p><a href="https://youtu.be/jZYkJf-9yXI" target="_blank" rel="noopener">https://youtu.be/jZYkJf-9yXI</a></p></blockquote><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/Encryption_KMS.PNG?raw=true" alt="Encryption_KMS.PNG"></p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/Encryption_KMS_1.PNG?raw=true" alt="Encryption_KMS_1.PNG"></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Encryption </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Server Hardware</title>
      <link href="2019/08/13/markdown/BackToBasic/Hardware/"/>
      <url>2019/08/13/markdown/BackToBasic/Hardware/</url>
      
        <content type="html"><![CDATA[<p>HPE Ethernet 10Gb 2-port 562FLR-SFP+ Adapter</p><p>FLR: integrated on motherboard<br>SFP: fiber<br>SPF+ : single port fiber support 10G</p>]]></content>
      
      
      
        <tags>
            
            <tag> hardware </tag>
            
            <tag> Server </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - ELB</title>
      <link href="2019/08/13/markdown/AWS/AWS2019/ELB/"/>
      <url>2019/08/13/markdown/AWS/AWS2019/ELB/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/VIgAT7vjol8" target="_blank" rel="noopener">https://youtu.be/VIgAT7vjol8</a></p></blockquote><h2 id="elastic-load-balancing-deep-dive-and-best-practices-2018"><a class="markdownIt-Anchor" href="#elastic-load-balancing-deep-dive-and-best-practices-2018"></a> Elastic Load Balancing: Deep Dive and Best Practices - 2018</h2><ul><li><p>Layer 4 and Layer 7 Load balancing difference,</p><ul><li>Layer 4 support TCP; Layer 7 only support http and https(will terminate the TLS)</li><li>Layer 7 Connection will be terminated and pooled</li><li>Layer 7 Headers can be modified</li><li>X-Forwarded-For http header will be modified</li></ul></li><li><p>Product mapping</p><ul><li>Application LB is layer 7 LB; Network LB is layer 4 LB</li></ul></li></ul><h2 id="alb"><a class="markdownIt-Anchor" href="#alb"></a> ALB</h2><ul><li>ALB support Path and host based routing (single ELB dispatch all traffic) ; deep integration with EKS – Micro Service Archi</li><li>ALB can do Redirects ; Fix response ; Slow start (configurable like 10 min) ; ALB IPV4 and V6 support;</li><li>ALB update certs<ul><li>IAM to control who have access to update</li><li>Use ACM (AWS Certificate Manager) to directly push and rotate certs with ALB</li></ul></li><li>Integrate with AWS WAF</li><li>Server Name Indication (SNI) : load balancing multiple applications that have muti certs</li><li>Authentication at ALB layer (OIDC, Cognito, SAML)</li><li>Muti-AZ (by default) and no extra bandwidth charge ;</li><li>Absorbs impact of DNS caching  (?)</li><li>Health check ; recommend to use http code to check; work with auto scaling</li></ul><h2 id="nlb"><a class="markdownIt-Anchor" href="#nlb"></a> NLB</h2><ul><li>Million Level request / second</li><li>Static IP for each AZ<ul><li>Firewall example: 2 layers of NLB ; fewer static ip simplified the firewall config</li><li>Route 53 will route to multiple static ip addresses in different AZ.</li></ul></li><li>Support Proxy Protocol V2</li><li>Cloudwatch metrics for NLB : it has flow log</li></ul><h2 id="netflix-demo-identity-platform"><a class="markdownIt-Anchor" href="#netflix-demo-identity-platform"></a> Netflix Demo – Identity Platform</h2><ul><li>Workforce Identity-as-a-Service</li><li>Federate All The Things</li><li>Developer Self-Service<ul><li>SSO; SAML , OAuth2</li></ul></li></ul><h3 id="challenging-with-identity-solution"><a class="markdownIt-Anchor" href="#challenging-with-identity-solution"></a> Challenging with Identity Solution</h3><ul><li>Always catch up new language and frameworks</li><li>Open source varying quality</li><li>Developer friction around configuration</li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/ALB_OpenIDSupport.PNG?raw=true" alt="ALB_OpenIDSupport.PNG"></p><ul><li>Spinnaker</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> ELB </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - DevOps 2019</title>
      <link href="2019/08/06/markdown/AWS/AWS2019/DevOps/"/>
      <url>2019/08/06/markdown/AWS/AWS2019/DevOps/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p>Empowering DevOps for Secure by Design (see it live)</p><blockquote><p><a href="https://youtu.be/8UG9E5moCdo" target="_blank" rel="noopener">https://youtu.be/8UG9E5moCdo</a></p></blockquote><ul><li>Workloads are provisioned in min, so security also needs to be addressed in min.<ul><li>Automated security provision</li><li>Secure-by-Design</li></ul></li><li>IBM CloudDeployment Services: multi-cloud support</li></ul><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><p>Enterprise DevOps: Patterns of Efficiency</p><blockquote><p><a href="https://youtu.be/qyhuMDozWXk" target="_blank" rel="noopener">https://youtu.be/qyhuMDozWXk</a></p></blockquote><h2 id="devops-vs-itil-devops-vs-cicd-enterprise-devops-vs-devops-for-startups"><a class="markdownIt-Anchor" href="#devops-vs-itil-devops-vs-cicd-enterprise-devops-vs-devops-for-startups"></a> DevOps vs ITIL ; DevOps vs CICD ; Enterprise DevOps vs DevOps for Startups</h2><ul><li>DevOps share core value with ITIL</li><li>Enterprise DevOps<ul><li>Insource value creation</li><li>DevOps legacy apps</li><li>Culture of inclusion</li></ul></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/DevOps_EnterpriseDevOps.PNG?raw=true" alt="DevOps_EnterpriseDevOps"></p><h2 id="enterprise-devops-case-study-nab"><a class="markdownIt-Anchor" href="#enterprise-devops-case-study-nab"></a> Enterprise DevOps Case study: NAB</h2><ul><li>outsource everything result in lost capability of innovation</li><li>Automating for successful DevOps</li></ul><h2 id="enterprise-devops-case-study-vendor"><a class="markdownIt-Anchor" href="#enterprise-devops-case-study-vendor"></a> Enterprise DevOps Case study: Vendor</h2><ul><li>Migrate to Cloud Quickly and Secured</li><li>Security is not roadblock</li><li>Challenges of scale with Security – Automation and tools and SME</li><li>Preventative ; Detective; Remediation. Try to shift Left (earlier)<ul><li>AWS Service Catalog</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> DevOps </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Digital Transformation</title>
      <link href="2019/08/05/markdown/AWS/AWS2019/DigitalTransformation/"/>
      <url>2019/08/05/markdown/AWS/AWS2019/DigitalTransformation/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/4Gr7hv24jK4" target="_blank" rel="noopener">https://youtu.be/4Gr7hv24jK4</a></p></blockquote><h2 id="culture-skills-organization-finance"><a class="markdownIt-Anchor" href="#culture-skills-organization-finance"></a> Culture, Skills, Organization, Finance</h2><ul><li><strong>Culture</strong><ul><li><strong>If you want to build a ship , don’t drum up the people to gather the wood, divide the work, and give orders. Instead , teach them to yearn for the vast and endless sea</strong></li><li>Use good judgement instead of process (security , flexibility, HA)</li><li>Ahead in the cloud “BEST PRACTICES for navigating the future of enterprise IT”</li><li>a Seat at the Table</li></ul></li><li><strong>Skill</strong><ul><li>Training and compensation</li><li>Recommend book : POWERFUL</li></ul></li><li><strong>Organization</strong><ul><li>Move from projects to product teams<ul><li>CD; DevOps, “run what you wrote”; Reduce tech-debt and lock-in</li><li>The Phoenix Project ; The DevOps Handbook</li></ul></li></ul></li><li><strong>Capex vs Opex</strong><ul><li>CTO and CFO who decide the IT structure?</li><li>With cloud, it’s hard to go Capex (pay as you go)</li></ul></li></ul><h2 id="pathway-to-digital-transformation"><a class="markdownIt-Anchor" href="#pathway-to-digital-transformation"></a> Pathway to digital transformation</h2><ul><li>Time to value: try to do simple things quickly<ul><li>elite companies are 2555* times faster than slow companies</li></ul></li><li>Distributed optimized capacity<ul><li>Scale, HA, cost-optimized; cloud native</li></ul></li><li>Critical workloads data center replacement : Strategic<ul><li>Who runs the “file drill” for IT ?<ul><li><strong>Chaos Engineering</strong>  (Book)</li></ul></li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Digital Transformation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - VPC</title>
      <link href="2019/08/05/markdown/AWS/AWS2019/VPC/"/>
      <url>2019/08/05/markdown/AWS/AWS2019/VPC/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/ar6sLmJ45xs" target="_blank" rel="noopener">https://youtu.be/ar6sLmJ45xs</a></p></blockquote><ul><li>North-South :</li><li>West-East :</li></ul><h2 id="challenge-with-current-vpc-architecture"><a class="markdownIt-Anchor" href="#challenge-with-current-vpc-architecture"></a> Challenge with current VPC architecture</h2><ul><li>lots of VPC and lots of connections and lots of peering<ul><li>VPC peering : can’t transit</li><li>Transit VPC (VPC with 10.1.0.0/16 and 10.2.0.0/16 go through transit VPC of 10.0.0.0/16)</li><li>Transit Gateway (2018)</li></ul></li></ul><h2 id="transit-gateway-2018-tgw"><a class="markdownIt-Anchor" href="#transit-gateway-2018-tgw"></a> Transit Gateway (2018) – tgw</h2><ul><li><p>Centralize VPN and AWS Direct Connect</p></li><li><p>5k VPC across accounts</p></li><li><p>Flexible</p><ul><li>Control segmentation and sharing with routing</li></ul></li><li><p>Compared with transit VPC</p><ul><li>AWS build in service</li></ul></li><li><p>AWS HyperPlane</p><ul><li>Backbone of NLB, NAT Gateway, EFS and now Transit Gateway</li><li>Region wide scope</li></ul></li></ul><h3 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_flat.PNG?raw=true" alt="vpc_flat"></p><ul><li>Flat : Every VPC should talk to each other.</li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_isolated.PNG?raw=true" alt="vpc_isolated"></p><ul><li>VPN: all traffic need go through VPN</li></ul><h3 id="reference-network-architecture"><a class="markdownIt-Anchor" href="#reference-network-architecture"></a> Reference Network Architecture</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/VPC_tgw_reference_arch.PNG?raw=true" alt="vpc_arch"></p><h3 id="new-feature-vpc-sharing-and-resource-access-manager"><a class="markdownIt-Anchor" href="#new-feature-vpc-sharing-and-resource-access-manager"></a> New Feature: VPC Sharing and Resource Access Manager</h3><ul><li>external account managing public subnet</li><li>internal account managing private subnet</li><li>by sharing vpc across different account, make the resource more flexible and avoid VPC peering in some cases</li></ul><h3 id="segmentation-considerations"><a class="markdownIt-Anchor" href="#segmentation-considerations"></a> Segmentation considerations</h3><ul><li>SG and IAM are effective and proven</li><li>Shared VPCs vs VPC peering : shared VPC can across multi-account</li><li>Separate VPC + Transit Gateway : simplest design without scaling issue (peering , VPC, routes)</li></ul><h3 id="sharing-considerations"><a class="markdownIt-Anchor" href="#sharing-considerations"></a> Sharing considerations</h3><ul><li>VPC peering (max 100 VPCs); support inter-regions</li><li>AWS PrivateLink : Supports overlapping CIDRs (using ELB)</li><li>AWS Transit VPC : Shared seervices as a spoke</li><li>Transit Gateway :  most advanced option</li></ul><h3 id="connecting-to-on-premises"><a class="markdownIt-Anchor" href="#connecting-to-on-premises"></a> Connecting to on-premises</h3><ul><li>Virtual Private Gateway VPN</li><li>Direct Connect</li><li>Customer VPN</li><li>Transit Gateway VPN</li></ul><h3 id="43min-an-advanced-use-case"><a class="markdownIt-Anchor" href="#43min-an-advanced-use-case"></a> 43min : an advanced use case (???)</h3><h3 id="reminder"><a class="markdownIt-Anchor" href="#reminder"></a> Reminder</h3><ul><li>existing DMZs moving to cloud might not be a good idea</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Transit VPC </tag>
            
            <tag> Transit Gateway </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudFormation 2019</title>
      <link href="2019/08/01/markdown/AWS/AWS2019/CloudFormation/"/>
      <url>2019/08/01/markdown/AWS/AWS2019/CloudFormation/</url>
      
        <content type="html"><![CDATA[<h1 id="whats-new"><a class="markdownIt-Anchor" href="#whats-new"></a> what’s New</h1><ul><li>more resources including Alexa and custom resource</li></ul><h2 id="managing-enterprise-complexity"><a class="markdownIt-Anchor" href="#managing-enterprise-complexity"></a> Managing enterprise complexity</h2><ul><li>Seamless handling secrets</li><li>StackSet – overide</li></ul><h2 id="improved-handling-of-secrets"><a class="markdownIt-Anchor" href="#improved-handling-of-secrets"></a> Improved handling of secrets</h2><ul><li>Use SSM to handle dynamic parameter</li></ul><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">MasterUsername:</span> <span class="string">''</span><span class="string">&#123;&#123;resolve:</span> <span class="string">secretsmanager:MyRDSSecrets:SecretString:username&#125;&#125;</span></span><br></pre></td></tr></table></figure><ul><li><p>AWS Cloudformation Macros</p><ul><li>Iteration</li><li>Transformation</li></ul></li><li><p>CloudFormation Linter</p><ul><li>Scripted --&gt; Declarative --&gt; DSLs --&gt; Imperative</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CloudFormation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudFormation 2019</title>
      <link href="2019/08/01/markdown/AWS/AWS2019/Whitepapers_BigDataOptions/"/>
      <url>2019/08/01/markdown/AWS/AWS2019/Whitepapers_BigDataOptions/</url>
      
        <content type="html"><![CDATA[<blockquote><p><a href="https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf" target="_blank" rel="noopener">https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf</a></p></blockquote><h1 id="amazon-kinesis"><a class="markdownIt-Anchor" href="#amazon-kinesis"></a> Amazon Kinesis</h1><ul><li>Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data.<ul><li>Capture and store <strong>terabytes</strong> of data per hour from <strong>hundreds of thousands of sources</strong></li><li>Store a cursor in DynamoDB</li></ul></li><li>Amazon Kinesis Video Streams enables you to build custom applications that process or analyze streaming video.</li><li>Amazon Kinesis Data Firehose enables you to deliver real-time streaming data to AWS destinations such as Amazon S3, Amazon Redshift, Amazon Kinesis Analytics, and Amazon Elasticsearch Service.</li><li>Amazon Kinesis Data Analytics enables you to process and analyze streaming data with standard SQL.</li></ul><h1 id="lambda"><a class="markdownIt-Anchor" href="#lambda"></a> Lambda</h1><ul><li>Default limit for concurrency is 1000</li></ul><h2 id="anti-pattern"><a class="markdownIt-Anchor" href="#anti-pattern"></a> Anti-pattern</h2><ul><li>Long running</li><li>Dynamic Websites</li><li>Stateful Applications</li></ul><h1 id="emr"><a class="markdownIt-Anchor" href="#emr"></a> EMR</h1><h2 id="anti-pattern-2"><a class="markdownIt-Anchor" href="#anti-pattern-2"></a> Anti-pattern</h2><ul><li>Small data set, Amazon EMR is built for massive parallel processing;</li><li>ACID transaction requirements</li></ul><h1 id="glue"><a class="markdownIt-Anchor" href="#glue"></a> Glue</h1><h2 id="anti-pattern-3"><a class="markdownIt-Anchor" href="#anti-pattern-3"></a> Anti-Pattern</h2><ul><li>Data Stearming</li><li>Glue is PySpark based</li><li>NoSQL DB not supported</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> Big Data </tag>
            
            <tag> AWS White Paper </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CLoudwatch 2019</title>
      <link href="2019/07/29/markdown/AWS/AWS2019/Cloudwatch/"/>
      <url>2019/07/29/markdown/AWS/AWS2019/Cloudwatch/</url>
      
        <content type="html"><![CDATA[<h1 id="some-numbers-about-cloudwatch"><a class="markdownIt-Anchor" href="#some-numbers-about-cloudwatch"></a> Some numbers about cloudwatch</h1><ul><li>as of Oct 2018, 100 petabytes of logs per month</li><li>Cloudwatch Egress<ul><li>S3; lambda; elastisearch; kinesis firehose</li></ul></li></ul><h1 id="cloudwatch-logs-insight"><a class="markdownIt-Anchor" href="#cloudwatch-logs-insight"></a> CLoudwatch Logs Insight</h1><p>Similar feature like ElastiCache. (handson with investigating the traffic security issue)</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/g1wxfYVjCPY" target="_blank" rel="noopener">https://youtu.be/g1wxfYVjCPY</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Cloudwatch </tag>
            
            <tag> Cloudwatch Insights </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Amazon Lambda</title>
      <link href="2019/07/25/markdown/AWS/AWS2019/Lambda/"/>
      <url>2019/07/25/markdown/AWS/AWS2019/Lambda/</url>
      
        <content type="html"><![CDATA[<h1 id="a-serverless-journey-aws-lambda-under-the-hood"><a class="markdownIt-Anchor" href="#a-serverless-journey-aws-lambda-under-the-hood"></a> A Serverless Journey: AWS Lambda Under the Hood</h1><h2 id="lambda-load-balancing"><a class="markdownIt-Anchor" href="#lambda-load-balancing"></a> Lambda Load Balancing</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/lambda_components.png?raw=true" alt="lambda_components"></p><ul><li><p><strong>Front End Invoke</strong>: authentication the caller, load configs &amp; env ; confirm concurrency with <strong>Counting Service</strong></p></li><li><p><strong>Counting Service</strong>: Region wide view of concurrency to help set limits (quorum protocol, 2/3 agreement protocol ); &lt;1.5 milliseconds response time</p></li><li><p><strong>Worker Manager</strong> : assume role, track the container lifecyle (running, idle) and maintain the worker pool</p></li><li><p><strong>Worker</strong> : provision sandbox and download customer code and run;<br>*  warm sandbox means the sandbox finished previous run<br>*  sandbox is equivalent of docker image</p></li><li><p><strong>Placement Service</strong>: provision worker</p></li><li><p>Example,</p><ul><li>Fannie Mae scale to between 20 and 50,000 concurrent executions over minutes.</li></ul></li></ul><h2 id="lambda-handling-failures"><a class="markdownIt-Anchor" href="#lambda-handling-failures"></a> Lambda Handling Failures</h2><ul><li>Multi-AZ</li></ul><h2 id="security-isolation"><a class="markdownIt-Anchor" href="#security-isolation"></a> Security Isolation</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2019/images/lambda_layers.png?raw=true" alt="lambda_layers"></p><ul><li>EC2 as worker level</li><li>EC2 Bare Metal as worker level (no hardware share with other account)<ul><li>Firecraker mode</li></ul></li><li>Virtual Devices have very limited access to improve security</li></ul><h2 id="managing-utilization"><a class="markdownIt-Anchor" href="#managing-utilization"></a> Managing Utilization</h2><ul><li>Keep the server busy</li><li>Utilization is handled by AWS<ul><li>Lambda have different algorithm to spread the load (concentrate the load)</li><li>Lambda Pack different/uncorrelated workload into one server to avoid similar workload spike all together.</li></ul></li></ul><h2 id="lambda-benefit"><a class="markdownIt-Anchor" href="#lambda-benefit"></a> Lambda benefit</h2><ul><li>Load Balancing</li><li>Auto Scaling</li><li>Handling Failures</li><li>Security Isolation</li><li>Managing Utilization</li></ul><h2 id="new-features"><a class="markdownIt-Anchor" href="#new-features"></a> new features</h2><ul><li>Change introduced from 2019<ul><li>Lambda connect out via a shared remote NAT to ENI to outside</li></ul></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/QdzV04T_kec" target="_blank" rel="noopener">https://youtu.be/QdzV04T_kec</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Lambda </tag>
            
            <tag> Serverless </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - IoT</title>
      <link href="2019/07/23/markdown/AWS/AWS2019/IoT/"/>
      <url>2019/07/23/markdown/AWS/AWS2019/IoT/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/LbeWdLaXYDo" target="_blank" rel="noopener">https://youtu.be/LbeWdLaXYDo</a></p></blockquote><h1 id="home-automation-home-security-home-networking"><a class="markdownIt-Anchor" href="#home-automation-home-security-home-networking"></a> Home automation ; Home security ; Home networking</h1><p>FreeRTOS / Greegrass --&gt; IoT Core, management, Analytics / Database, ML --&gt; IoT applications</p><h2 id="demo-from-vestel"><a class="markdownIt-Anchor" href="#demo-from-vestel"></a> DEMO from Vestel</h2><ul><li>VESTEL</li><li>Dedicated IoT group</li><li>Highlight of current archi<ul><li>Use IoT Core</li><li>Use API Gateway to support service for both Alexa and GoogleHome</li><li>Use lambda to run logic against IoT Core and try serveless</li></ul></li></ul><h2 id="simplify-large-number-of-iot-devices"><a class="markdownIt-Anchor" href="#simplify-large-number-of-iot-devices"></a> Simplify large number of IoT devices</h2><ul><li>WPA3 Specification, new device provision protocol</li><li>By using the mobile to scan the barcode to get the public key of the device ; then  the router automatically allow the device to connect to internet.</li></ul><h2 id="home-security-monitoring"><a class="markdownIt-Anchor" href="#home-security-monitoring"></a> Home Security &amp; Monitoring</h2><ul><li>Amazon FreeRTOS,</li><li>AWS Greengrass, allows local RTOS communicate each other</li><li>SageMaker: training the model --&gt; export model to S3</li><li>IoT Core, create a rule,  subscribe sound from rule and assign to lambda to call the trained model to detect the sound.</li><li>Push the model to greengrass (local) , then device can push the data to local greengrass to run the same the lambda function.</li><li>Greengrass discovery – a green grass device can discover and connect with the greengrass device</li></ul><h2 id="home-networking"><a class="markdownIt-Anchor" href="#home-networking"></a> Home networking</h2><ul><li>Greengrass as a hub</li><li>Use IoT , using Device Defender , to detect unusual publishing</li></ul><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><blockquote><p><a href="https://youtu.be/HEQkVHxu46A" target="_blank" rel="noopener">https://youtu.be/HEQkVHxu46A</a></p></blockquote><h2 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h2><h3 id="edge-endpoint-amazon-freertos"><a class="markdownIt-Anchor" href="#edge-endpoint-amazon-freertos"></a> Edge / Endpoint : Amazon FreeRTOS</h3><ul><li>OTA: over the air</li></ul><h3 id="device-gateway-greengrass-core"><a class="markdownIt-Anchor" href="#device-gateway-greengrass-core"></a> Device Gateway: GreenGrass Core</h3><ul><li>Can be on-promise or in cloud</li><li>Protocol : MQTT, WebSockets, HTTP</li><li>TLS 1.2 Only</li><li>Message Broker</li></ul><h3 id="device-management-iot-device-management"><a class="markdownIt-Anchor" href="#device-management-iot-device-management"></a> Device Management: IoT Device Management</h3><ul><li>Batch Fleet Provision</li><li>Search device</li></ul><h3 id="iot-device-defender"><a class="markdownIt-Anchor" href="#iot-device-defender"></a> IoT Device Defender</h3><ul><li>Audit Device Config /monitor / Identify Anomalies / Alerts/ Patch</li><li>For example, security best practice check (certificate sharing )</li></ul><h3 id="iot-analytics"><a class="markdownIt-Anchor" href="#iot-analytics"></a> IoT Analytics</h3><ul><li>Pipelines --&gt; Analysis / ML</li></ul><h3 id="other-features"><a class="markdownIt-Anchor" href="#other-features"></a> Other features</h3><ul><li>1-Click , provisioned device . like the aws purchase button.</li></ul><h2 id="demo-modjoul"><a class="markdownIt-Anchor" href="#demo-modjoul"></a> Demo – Modjoul</h2><ul><li><p>8 sensors , 50 MB data per person per day</p></li><li><p>2 weeks data storage locally</p></li><li><p>Use IoT Analytics replace EMR</p></li><li><p>COmment, I DON’T LIKE THIS SOLUTION… haha</p></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> IoT </tag>
            
            <tag> WPA3 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Polly</title>
      <link href="2019/07/23/markdown/AWS/AWS2019/Polly/"/>
      <url>2019/07/23/markdown/AWS/AWS2019/Polly/</url>
      
        <content type="html"><![CDATA[<p><a href="https://aws.amazon.com/blogs/machine-learning/build-your-own-text-to-speech-applications-with-amazon-polly/#" target="_blank" rel="noopener">https://aws.amazon.com/blogs/machine-learning/build-your-own-text-to-speech-applications-with-amazon-polly/#</a></p><ul><li>Lambda changed to 3.7<ul><li>2 lines of code need to be updated.</li></ul></li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">print</span> (<span class="string">"Text to Speech function. Post ID in DynamoDB: "</span> + postId)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#In Python 3 it makes a difference whether you open the file in binary or text mode. Just add the b flag to make it binary:</span></span><br><span class="line"><span class="keyword">with</span> open(output, <span class="string">"ab"</span>) <span class="keyword">as</span> file:</span><br><span class="line">    file.write(stream.read())</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Polly </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Amazon Route 53</title>
      <link href="2019/07/23/markdown/AWS/AWS2019/Route53/"/>
      <url>2019/07/23/markdown/AWS/AWS2019/Route53/</url>
      
        <content type="html"><![CDATA[<h1 id="route53-resolver-released-in-201812"><a class="markdownIt-Anchor" href="#route53-resolver-released-in-201812"></a> Route53 Resolver (Released in 2018.12)</h1><ul><li>Issue: in hybrid architecture, VPC can’t access Data Center name and Data center can’t access VPC private DNS name.</li><li>Traditional workaround:<ul><li>spin up EC2 to run bind or unbound as DNS server, used to forward request to plus-2 resolver</li><li>need to consider failover and sometimes a group of DNS server per vpc</li></ul></li><li>This requirement is called Recursive DNS lookup.</li></ul><h2 id="how-route53-resolver-works"><a class="markdownIt-Anchor" href="#how-route53-resolver-works"></a> How Route53 Resolver works</h2><ul><li>only works for single region (can’t span region)</li><li>multiple VPCs under multiple accounts (as long as they are in same region) can share the same Resolver endpoint</li><li>Need to provision ENI for the resolver, for HA and performance, recommend to provision multiple ENIs<ul><li>One ENI serving one direction of querying (for example, from VPC to On-Pre)</li></ul></li><li>When a resolve request received, it will check against all resolve rules, if no matching, treat as local.<ul><li>rules can be shared between accounts (via Resource Access Manager  – RAM)</li></ul></li></ul><h2 id="route-53-resolver-demo"><a class="markdownIt-Anchor" href="#route-53-resolver-demo"></a> Route 53 Resolver Demo</h2><ul><li><p>Resolving sequence</p><ul><li>Auto defined Rules: VPC / Private Hosted Zones/ Internet Resolver</li><li>Extra rules<ul><li>tips, have “.” rule work as default query forward rule, anything not fit in auto defined rules will go to <a href="http://ns.mycompany.com" target="_blank" rel="noopener">ns.mycompany.com</a></li><li>tips, <a href="http://ns.mycomany.com" target="_blank" rel="noopener">ns.mycomany.com</a> have a “.” rule to recursive request to internet if no rules matched</li><li>tips, a rule to <strong>forward</strong> any request to <a href="http://acquriedcompany.com" target="_blank" rel="noopener">acquriedcompany.com</a> to <a href="http://ns.acquriedcompany.com" target="_blank" rel="noopener">ns.acquriedcompany.com</a></li></ul></li></ul></li><li><p>API used to create endpoints;</p><ul><li>Endpoint need to have attached security group to allow port 53</li><li>API to create rule</li><li>API to share defined rules</li></ul></li><li><p>Monitoring: Cloudwatch and CloudTrail</p></li></ul><h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><p>Authoritative DNS<br>Recursive DNS</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/D1n5kDTWidQ" target="_blank" rel="noopener">https://youtu.be/D1n5kDTWidQ</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Route53 </tag>
            
            <tag> Hybrid Cloud </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Database family</title>
      <link href="2019/07/22/markdown/AWS/AWS2019/BestPractice_selectDataLayer/"/>
      <url>2019/07/22/markdown/AWS/AWS2019/BestPractice_selectDataLayer/</url>
      
        <content type="html"><![CDATA[<h1 id="use-the-right-tool-for-the-right-job"><a class="markdownIt-Anchor" href="#use-the-right-tool-for-the-right-job"></a> Use the right Tool for the right job</h1><p>Aurora benefit :</p><ul><li><p>5x throughput vs MySQL and 3x to Postgres</p></li><li><p>Max 15 read replica</p></li><li><p>six copies of data across 3 AZ and continuous backup to S3</p></li><li><p>AWS DMS (Data Migration Service)</p></li></ul><h1 id="new-tools"><a class="markdownIt-Anchor" href="#new-tools"></a> New Tools</h1><blockquote><p>Data tools are not competing  each other, they are complementing each other.<br>Pick the use case then apply the corresponding tech</p></blockquote><ul><li>RDB</li><li>Key-value</li><li>Document</li><li>In-memory</li><li>Graph (Nepture)</li><li>Time-Series</li><li>Ledger</li></ul><h2 id="rdb-key-value-graph"><a class="markdownIt-Anchor" href="#rdb-key-value-graph"></a> RDB Key-value Graph</h2><p>RDB: data integrity ; transaction<br>Key-value: partitioned by keys, consistent performance at scale<br>Graph: <strong>Vertices</strong> and Edges</p><h2 id="case-study"><a class="markdownIt-Anchor" href="#case-study"></a> Case Study</h2><ul><li><p>Airbnb</p><ul><li>Dynamo for use search history</li><li>ElastiCache : caching</li><li>RDS : transaction data</li></ul></li><li><p>A book store</p><ul><li>Used DynamoDB (key-value) to put book information</li><li>ElastiSearch — Steam dynamodb change to trigger lambda to put into elastisearch index</li><li>leader board — use elasticache ; (???) sorting</li><li>Recommendation engine – use graph db to record people with book and purchases</li></ul></li></ul><h2 id="ledger-database"><a class="markdownIt-Anchor" href="#ledger-database"></a> Ledger Database</h2><p>Industry: Healthcare, Government, Manufactures, HR&amp;Payroll</p><ul><li>I want the data to be immutable, can be tracked back, can be Cryptographically Verifiable</li><li>Blockchain is hard to maintain</li><li>Amazon QUantum Ledger Database: Immutable, Cryptographically verifiable, High scalable, Easy to use</li></ul><h2 id="time-series-data-aws-timestream"><a class="markdownIt-Anchor" href="#time-series-data-aws-timestream"></a> Time Series Data – AWS Timestream</h2><p>What kind of data is tiem series data,</p><ul><li>weather ; IoT ; DevOps data</li><li>Time-series data will only have x axis as time , y can be changed in-flight and be flexible</li><li>Change to data from hot-&gt;warm-&gt;cold storage</li><li>millions of inserts (10M/sercond); serverless ; Trillions of daily events</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>Databases on AWS: The Right Tool for the Right Job ( good PRZ)<br><a href="https://youtu.be/-pb-DkD6cWg" target="_blank" rel="noopener">https://youtu.be/-pb-DkD6cWg</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Differentiation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - EFS</title>
      <link href="2019/07/22/markdown/AWS/AWS2019/EFS/"/>
      <url>2019/07/22/markdown/AWS/AWS2019/EFS/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/4FQvJ2q6_oA" target="_blank" rel="noopener">https://youtu.be/4FQvJ2q6_oA</a></p></blockquote><ul><li>AWS has 3 main adoption patterns, that can be mapped to 3 storage categories<ul><li>Re-Hosting – Block Storage</li><li>Re-Platform – File Storage – EFS</li><li>Re-Architecting  – Object Storage</li></ul></li></ul><h2 id="whats-new"><a class="markdownIt-Anchor" href="#whats-new"></a> What’s new</h2><ul><li>EFS only support linux; new FSx for Windows File Server</li><li>FSx for Lustre</li><li>Support Multi-VPC access</li><li>AWS DataSync : initial full copy, and subsequent incremental transfers of changed data to cloud ; Muti thread</li><li>TCO Example, 100G standard storage, 400G Infrequent, around $50/month</li></ul><h2 id="deep-dive"><a class="markdownIt-Anchor" href="#deep-dive"></a> Deep Dive</h2><ul><li>Performance mode<ul><li>General Purpose , focus on low latency (max 7k iops/sec) – Recommend to start with</li><li>Max I/O, focus on I/O (higher latencies)</li></ul></li><li>Throughput mode<ul><li>Busting Throughput – Recommend to start with</li><li>Provisioned throughput (you can decrease every 24 hours)</li></ul></li><li>EFS Infrequent Access (85% cheaper)<ul><li>Auto lifecycle management (any file not being accessed more than 30 days)</li></ul></li></ul><h3 id="security-model"><a class="markdownIt-Anchor" href="#security-model"></a> Security Model</h3><p>Network using ACL; Access using POSIX or IAM; Encrypt ; Compiance with HIPAA etc.</p><h2 id="use-case"><a class="markdownIt-Anchor" href="#use-case"></a> Use case</h2><ul><li>Atlassian - JIRA</li><li>T-Mobile<ul><li>K8S with EFS (Persistent Volumes for 100s of nodes)</li><li>Cache build dependencies with CICD (Maven dependencies as example)</li><li>Centralized Repository</li><li>Tibco EMS HA</li></ul></li></ul><h2 id="best-practices"><a class="markdownIt-Anchor" href="#best-practices"></a> Best Practices</h2><ul><li>Throughput<ul><li>Multi-Threads</li><li>Multi Directories</li><li>Use large IO (aggregate IO)</li></ul></li><li>IOPS<ul><li>Multi Threads</li><li>Multi Directories</li></ul></li><li>Use Cloudwatch to monitor</li></ul><h1 id="choose-the-right-performance-with-file-system-2018"><a class="markdownIt-Anchor" href="#choose-the-right-performance-with-file-system-2018"></a> Choose the Right performance with File System (2018)</h1><ul><li>After 2018, EFS support provision throughput</li><li>Similar with EBS provision but irrelavant with the size of storage, can be modified using CLI<ul><li>Auto-provision the throughput is in consideration but not available yet</li></ul></li><li>Demo<ul><li>using ioping: A tool to monitor I/O latency in real time</li><li>using nload to monitor network status (because EFS is mounted via network)</li><li>multi-thread will increase the throughput</li><li>use aws efs cli to update the throughput limit , then on-flight change happened</li></ul></li><li>EFS mount helper can help you figure out what configuration you need</li><li>EFS cloudwatch ready to use metrics to help you setup and monitor and tune EFS</li></ul><p>Use Scenario: web , CICD , DEV, big data, ML , db backup<br>Compliant: Healthcare , PCI compliant(payment data) ; at-rest and in-transit security both supported ( no extra cost, but will have performance impact); built in support with KMS and CMK.<br>Soft Limit with EFS:  1G/Sec in all region (can increase when request)<br><strong>EFS FileSnc</strong>:  new feature used to migrate local data into EFS multi-threading with security</p><p>Security : KMS CMK</p><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/TS1wS_Wb6PA" target="_blank" rel="noopener">https://youtu.be/TS1wS_Wb6PA</a></p><blockquote></blockquote><p><a href="https://github.com/koct9i/ioping" target="_blank" rel="noopener">https://github.com/koct9i/ioping</a></p><blockquote></blockquote><p><a href="https://aws.amazon.com/blogs/aws/efs-file-sync-faster-file-transfer-to-amazon-efs-file-systems/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/efs-file-sync-faster-file-transfer-to-amazon-efs-file-systems/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> EFS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - GreenGrass</title>
      <link href="2019/07/04/markdown/AWS/AWS2018/GreenGrass/"/>
      <url>2019/07/04/markdown/AWS/AWS2018/GreenGrass/</url>
      
        <content type="html"><![CDATA[]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> IoT </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - S3</title>
      <link href="2019/06/21/markdown/AWS/AWS2018/TroubleShooting/s3_access/"/>
      <url>2019/06/21/markdown/AWS/AWS2018/TroubleShooting/s3_access/</url>
      
        <content type="html"><![CDATA[<h1 id="trouble-shooting-public-object-access-denied"><a class="markdownIt-Anchor" href="#trouble-shooting-public-object-access-denied"></a> Trouble shooting : Public Object Access Denied</h1><ul><li>ACL and Bucket Policy all set Public</li><li>Account and Bucket level allow it to be Public</li><li>Observation: object uploaded from console works, object uploaded from another account failed.</li></ul><p>Add below to specify the public access as well as assign the original bucket user to have full control</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">--acl public-read</span><br></pre></td></tr></table></figure><h1 id="use-aws-js-s3-explorer"><a class="markdownIt-Anchor" href="#use-aws-js-s3-explorer"></a> use aws-js-s3-explorer</h1><p><a href="https://github.com/awslabs/aws-js-s3-explorer" target="_blank" rel="noopener">https://github.com/awslabs/aws-js-s3-explorer</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> troubleshooting </tag>
            
            <tag> AWS </tag>
            
            <tag> S3 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kimball Dimensional Modeling Techniques Overview</title>
      <link href="2019/06/17/markdown/Datawarehouse/02_KimballDimensionalModelingTechniquesOverview/"/>
      <url>2019/06/17/markdown/Datawarehouse/02_KimballDimensionalModelingTechniquesOverview/</url>
      
        <content type="html"><![CDATA[<h1 id="fundamental-concepts"><a class="markdownIt-Anchor" href="#fundamental-concepts"></a> Fundamental Concepts</h1><h2 id="gather-business-requirements-and-data-realities"><a class="markdownIt-Anchor" href="#gather-business-requirements-and-data-realities"></a> Gather Business Requirements and Data Realities</h2><p>samples in the book</p><p>Chapter 1 DW/BI and Dimensional Modeling Primer , p 5<br>Chapter 3 Retail Sales , p 70<br>Chapter 11 Telecommunications , p 297<br>Chapter 17 Lifecycle Overview , p 412<br>Chapter 18 Dimensional Modeling Process and Tasks , p 431<br>Chapter 19 ETL Subsystems and Techniques ,p 444</p><h2 id="collaborative-dimensional-modeling-workshops"><a class="markdownIt-Anchor" href="#collaborative-dimensional-modeling-workshops"></a> Collaborative Dimensional Modeling Workshops</h2><p>Dimension models should be designed by folks who fully understand the business and their needs.</p><h2 id="four-step-dimensional-design-process"><a class="markdownIt-Anchor" href="#four-step-dimensional-design-process"></a> Four-Step Dimensional Design process</h2><ul><li>Select the business Process</li><li>Declare the Grain</li><li>Identify the Dimensions</li><li>Identify the facts</li></ul><h2 id="business-processes"><a class="markdownIt-Anchor" href="#business-processes"></a> Business Processes</h2><p>Operational Activities</p><h2 id="grain"><a class="markdownIt-Anchor" href="#grain"></a> Grain</h2><p>The grain establishes exactly what a single fact table row represents.</p><h2 id="dimensions-for-descriptive-context"><a class="markdownIt-Anchor" href="#dimensions-for-descriptive-context"></a> Dimensions for Descriptive Context</h2><h2 id="facts-for-measurements"><a class="markdownIt-Anchor" href="#facts-for-measurements"></a> Facts for Measurements</h2><h2 id="star-schemas-and-olap-cubes"><a class="markdownIt-Anchor" href="#star-schemas-and-olap-cubes"></a> Star Schemas and OLAP Cubes</h2><h2 id="graceful-extensions-to-dimensional-models"><a class="markdownIt-Anchor" href="#graceful-extensions-to-dimensional-models"></a> Graceful Extensions to Dimensional Models</h2><ul><li>Add column to Fact table to describe FACT</li><li>Add column to Fact table to contain foreign key to new dimension table</li><li>Add column to Dimension table to add Attributes</li></ul><h1 id="basic-fact-table-techniques"><a class="markdownIt-Anchor" href="#basic-fact-table-techniques"></a> Basic Fact Table Techniques</h1><h2 id="fact-table-structure"><a class="markdownIt-Anchor" href="#fact-table-structure"></a> Fact Table Structure</h2><p>A fact table contains the numeric measure produced by an operational measurement event in the real world.</p><h2 id="additive-semi-additive-non-additive-facts"><a class="markdownIt-Anchor" href="#additive-semi-additive-non-additive-facts"></a> Additive, Semi-Additive, Non-Additive Facts</h2><p>Balance amounts are common semi-additive facts because they are additive across all dimensions except time.<br>Some measures are completely non-additive, such as ratios.</p><h2 id="nulls-in-fact-tables"><a class="markdownIt-Anchor" href="#nulls-in-fact-tables"></a> Nulls in Fact Tables</h2><p><strong>nulls must be avoided in the fact table’s foreign keys</strong></p><h2 id="conformed-facts"><a class="markdownIt-Anchor" href="#conformed-facts"></a> Conformed facts</h2><p>Same fact across different table, must use same name</p><h2 id="transaction-fact-tables"><a class="markdownIt-Anchor" href="#transaction-fact-tables"></a> Transaction Fact Tables</h2><h2 id="periodic-snapshot-fact-tables"><a class="markdownIt-Anchor" href="#periodic-snapshot-fact-tables"></a> Periodic Snapshot Fact Tables</h2><h2 id="factless-fact-tables"><a class="markdownIt-Anchor" href="#factless-fact-tables"></a> Factless Fact Tables</h2><p>Samples : table containing students attend school or not.</p><h2 id="aggregate-fact-tables"><a class="markdownIt-Anchor" href="#aggregate-fact-tables"></a> Aggregate Fact Tables</h2><p>For accelerate the query performance.</p><h2 id="consolidated-fact-table"><a class="markdownIt-Anchor" href="#consolidated-fact-table"></a> Consolidated Fact Table</h2><p>Sales actual and sales forcast being saved into same table, this design will make it easy to analysis but hard to ETL.</p><h1 id="basic-dimension-table-techniques"><a class="markdownIt-Anchor" href="#basic-dimension-table-techniques"></a> Basic Dimension Table Techniques</h1><h2 id="dimension-surrogate-key"><a class="markdownIt-Anchor" href="#dimension-surrogate-key"></a> Dimension Surrogate Key</h2><ul><li>Structure: wide, flat, denormalized tables with many low-cardinality text Attributes.</li><li>Single primary keys<ul><li>Can’t use operational system’s natural key</li><li>Recommend to use anonymous integer primary key; Date dimension is exempt from this rule.</li></ul></li></ul><h2 id="natural-durable-and-supernatural-key"><a class="markdownIt-Anchor" href="#natural-durable-and-supernatural-key"></a> Natural, Durable and Supernatural key</h2><ul><li><p>Natural key is generated from business System</p></li><li><p>Durable / Supernatural key is generated by DW to indicate although Natural Key changed but it’s the same object. (for example an employ rejoined.)</p></li><li><p>Drilling down: fundamental data analysis method</p></li><li><p>Degenerate Dimensions</p></li></ul><blockquote><p>example : an invoice with multiple items.  Items fact table has all the dimensions as foreign key. Then invoice number become a dimension for item fact table ; but the invoice number dimension do not has any attribute with it. So the invoice number dimension table became a Degenerate Dimension. And this kind of dimension would be helpful with transaction and accumulating snapshot fact tables.</p></blockquote><ul><li>Use text words in dimension attribute instead of crypic abbreviations , flags etc</li><li>Why to use Date Dimension instead of using SQL compute: because Date Dimension has more attributes like: week number, holiday , fiscal period etc.<ul><li>DateTime dimension table also need default row as normal dimension table</li></ul></li><li>Role playing dimension: means dimension being defined once but being referenced mulitple times in one fact table and each time has different meaning. For example , Time dimension.</li><li>Junk Dimension: when transaction has loads of dimension that don’t have a lot of value, we can combine some of them as one dimension.</li><li>Snowflaked Dimensions: when you normalized all the dimension table.</li><li>Outtrigger Dimensions: when dimension reference another dimension.<ul><li>for example a dimension refer to date dimension.</li><li>The baseline is dimensions are all supporting fact table. There shouldn’t be a case that fact table need one dimension to get the key of another dimension.</li></ul></li></ul><h1 id="integration-via-conformed-dimensions"><a class="markdownIt-Anchor" href="#integration-via-conformed-dimensions"></a> Integration via Conformed Dimensions</h1><h1 id="dealing-with-slowly-changing-dimension-attributes"><a class="markdownIt-Anchor" href="#dealing-with-slowly-changing-dimension-attributes"></a> Dealing with Slowly Changing Dimension Attributes</h1><h1 id="dealing-with-dimension-hierarchies"><a class="markdownIt-Anchor" href="#dealing-with-dimension-hierarchies"></a> Dealing with Dimension Hierarchies</h1><h2 id="fixed-depth-positional-hierarchies"><a class="markdownIt-Anchor" href="#fixed-depth-positional-hierarchies"></a> Fixed Depth Positional Hierarchies</h2><h2 id="slightly-raggedvariable-depth-hierarchies"><a class="markdownIt-Anchor" href="#slightly-raggedvariable-depth-hierarchies"></a> Slightly Ragged/Variable Depth Hierarchies</h2><h2 id="raggedvariable-depth-hierarchies-with-hierarchy-bridge-tables"><a class="markdownIt-Anchor" href="#raggedvariable-depth-hierarchies-with-hierarchy-bridge-tables"></a> Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables</h2><h2 id="raggedvariable-depth-hierarchies-with-pathstring-attributes"><a class="markdownIt-Anchor" href="#raggedvariable-depth-hierarchies-with-pathstring-attributes"></a> Ragged/Variable Depth Hierarchies with Pathstring Attributes</h2><h1 id="advanced-fact-table-techniques"><a class="markdownIt-Anchor" href="#advanced-fact-table-techniques"></a> Advanced Fact Table Techniques</h1><h1 id="advanced-dimension-techniques"><a class="markdownIt-Anchor" href="#advanced-dimension-techniques"></a> Advanced Dimension Techniques</h1><h1 id="special-purpose-schemas"><a class="markdownIt-Anchor" href="#special-purpose-schemas"></a> Special Purpose Schemas</h1>]]></content>
      
      
      
        <tags>
            
            <tag> Datawarehouse </tag>
            
            <tag> Kimball </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kimball Dimensional Modeling Techniques applied to Inventory Sample`</title>
      <link href="2019/06/17/markdown/Datawarehouse/04_Inventory/"/>
      <url>2019/06/17/markdown/Datawarehouse/04_Inventory/</url>
      
        <content type="html"><![CDATA[<h1 id="value-chain-introduction"><a class="markdownIt-Anchor" href="#value-chain-introduction"></a> Value Chain Introduction</h1><p>For value chain, here introduces 3 models.</p><h2 id="inventory-periodic-model"><a class="markdownIt-Anchor" href="#inventory-periodic-model"></a> Inventory Periodic Model</h2><ul><li>Scenario: a grocery with 60,000 products * 100 stores, with daily periodic model, there would be 60k*100=6millon records per day.</li><li>Estimation<ul><li>14byte per row * 6million =84mb per day ; 3 years will be 84 * 1095day=91G data</li><li>or 60days of daily and archive old data to weekly snapshot;</li></ul></li><li>Semi-Additive Facts<ul><li>Pay attention to the use of &quot; SQL AVG&quot; when do summarize</li></ul></li><li>Enhanced Inventory Facts<ul><li>Adding more column to fact table including quantity on hand, quantity sold,<ul><li>quantity sold daily / quantity at hand daily = number of turns</li><li>quantity sold whole year / average quantity at hand daily = number of turns for a year</li><li>Estimate number of days’ supply = current quantity at hand / average quantity sold per day</li></ul></li><li>Adding inventory at cost and inventory value at latest selling price</li></ul></li></ul><h2 id="inventory-transactions-model"><a class="markdownIt-Anchor" href="#inventory-transactions-model"></a> Inventory Transactions model</h2><p>P117</p>]]></content>
      
      
      
        <tags>
            
            <tag> Datawarehouse </tag>
            
            <tag> Kimball </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Data Wharehousing, Business Intelligence, and Dimensional Modeling Primer</title>
      <link href="2019/06/02/markdown/Datawarehouse/Overview/"/>
      <url>2019/06/02/markdown/Datawarehouse/Overview/</url>
      
        <content type="html"><![CDATA[<h1 id="key-difference-between-operational-system-and-data-warehouse"><a class="markdownIt-Anchor" href="#key-difference-between-operational-system-and-data-warehouse"></a> Key difference between operational system and Data warehouse</h1><ul><li>一个往里面送数据，一个往外查数据</li><li>一个要求transaction并且保持当前状态准确，业务逻辑严格按照流程来；一个要求大量查询和比对，查询需求不停变化</li></ul><h1 id="goals-of-data-warehousing-and-business-intelligence"><a class="markdownIt-Anchor" href="#goals-of-data-warehousing-and-business-intelligence"></a> Goals of Data Warehousing and Business Intelligence</h1><ul><li>收集的数据不好用</li><li>收集的数据不是查询友好</li><li>业务人员用起来不方便</li><li>数据不一致</li><li>我们想实现fact-based决策</li></ul><p>所以DW需要，</p><ul><li>数据贴近业务人员；好理解</li><li>consistent： 一样的名字必须代表一样的东西</li><li>能够支持需求变化，能够支持变化的时候对用户透明</li><li>数据必须及时，即使需要clean和validate</li><li>数据安全非常重要， DW的信息决定了一个企业“卖什么东西给谁以什么价格”</li><li>DW是一个decision support system</li><li>DW必须得到业务人员的支持和使用才能成功；跟业务系统不一样，DW是optional，不好用就会被废弃</li></ul><h2 id="publishing-metaphor-for-dwbi-managers"><a class="markdownIt-Anchor" href="#publishing-metaphor-for-dwbi-managers"></a> Publishing Metaphor for DW/BI Managers</h2><p>把DW必须成发行杂志。DW需要</p><ul><li>理解读者</li><li>取悦读者</li><li>保证发行</li></ul><p>类似于发行杂志，DW需要选择数据源，保证数据准确，然后以正确的方式展现给读者（用户），定期更新。</p><h1 id="dimensional-modeling-introduction"><a class="markdownIt-Anchor" href="#dimensional-modeling-introduction"></a> Dimensional Modeling Introduction</h1><p>Dimensional modeling实现了两个难点：</p><ul><li>易于被业务user理解</li><li>查询快速</li></ul><p>“We sell products in various markets and measure our performance over time”<br>这句话里面蕴含了3个dimension， “product”，”market“和”time”</p><p>Dimensional model常常使用关系型数据库，但是和3NF（normal form）模型不同。</p><ul><li>3NF的目的是去除redundency， 属于ER （entity relationship）模型； Dimensional模型也属于ER模型</li><li>3NF和Dimential model的关键不同是normalization的程度</li><li>3NF的normalization程度更高，我们一般叫normalized model</li><li>3NF的缺点是复杂以及查询性能不好</li><li>dimensional model易于用户理解；查询性能好，易于根据业务需求变化而变化</li></ul><h2 id="star-schemas-versus-olap-cubes"><a class="markdownIt-Anchor" href="#star-schemas-versus-olap-cubes"></a> Star Schemas Versus OLAP Cubes</h2><ul><li>Dimensional model用关系型数据库实现就是Star Schema</li><li>Dimensional model用多维数据库实现就是OLAP data cube</li></ul><h2 id="olap-deployment-considerations"><a class="markdownIt-Anchor" href="#olap-deployment-considerations"></a> OLAP Deployment Considerations</h2><ul><li>Star Schema是基础</li><li>OLAP的性能优势在被新技术蚕食（例如内存数据库，columnar DB）</li><li>OLAP的表设计常常绑定技术提供商，移植性比较差。</li><li>OLAP的数据安全性比较好；可以做到限制用户只能看到summary</li><li>OLAP的分析能力更强大</li><li>OLAP对变化的dimension支持更好</li><li>OLAP支持snapshot fact但是不支持accumulate</li><li>OLAP对hirarchy等类型的数据查询支持比较好</li></ul><h2 id="fact-tables-for-measurements"><a class="markdownIt-Anchor" href="#fact-tables-for-measurements"></a> Fact Tables for Measurements</h2><ul><li>Each row in a fact table corresponds to a measurement event. 不能拆。</li></ul><blockquote><p>a measurement event in the physical world has a one-to-one<br>relationship to a single row in the corresponding fact table is a bedrock principle<br>for dimensional modeling</p></blockquote><ul><li>Facts are often described as continuously valued to help sort out what is a fact<br>versus a dimension attribute.<ul><li>Additivity fact : 销售额</li><li>Semi-Additivity fact： 例如account balance</li><li>Non-Additivity fact：例如产品单价</li></ul></li><li>textual Fact： 通常没有， 如果有也尽量放到Dimensional里面去</li><li>Empty item. Fact 里面一定要放发生的事件，没有发生不要尝试放0.</li><li>Fact表通常非常sparse； Fact表通常占据90%的存储； Fact表通常row非常大，column比较少；Fact表通常可以通过size预估行数</li><li>Fact表分三种：transaction, periodic snapshot, and accumulating snapshot.</li><li>Fact表至少有两个外键， 用来引用dimension表的主键</li><li><strong>referential integrity</strong> 保证Fact表的条目引用的每个外键都正确</li><li><strong>composite key</strong> Fact表的主键通常由所有的外键组合而成.</li></ul><h2 id="dimension-tables-for-descriptive-context"><a class="markdownIt-Anchor" href="#dimension-tables-for-descriptive-context"></a> Dimension Tables for Descriptive Context</h2><ul><li>Dimension table 用来定义measurable业务事件的textual context</li><li>Dimension 表描述who, what,when,where, how, why</li><li>Dimension表通常列非常多，通常50-100个很正常</li><li>Dimension表通常row少，column多</li><li>Dimension表只有一个主键</li><li>Dimension的attribute是主要的查询，分组以及报告label的来源</li><li>Dimension的attribute名字必须有业务含义</li><li>例如如果一个code有前两个字段代表一个含义，后面两个字段代表一个含义，设计的时候最好单独出来一个dimension而不是让客户查询的时候manipulate字符串</li><li>实际设计的时候，如何确定一个numeric value是fact还是dimensional – 确定它们是不是需要参与计算； 看数字是连续还是离散的</li></ul><h2 id="facts-and-dimensions-joined-in-a-star-schema"><a class="markdownIt-Anchor" href="#facts-and-dimensions-joined-in-a-star-schema"></a> Facts and Dimensions Joined in a Star Schema</h2><p>Benefit of Star Schema</p><ul><li><p>Easy to understand</p></li><li><p>Simplicity brings in performance benefits</p></li><li><p>Dimensional model are gracefuly extensible to accommodate change.</p><ul><li>Fact won’t change, but dimension values can.</li><li>By adding new rows to dimension table or alter current fact table to add new dimension FK will fulfilll the change requirement</li></ul><p>A sample of SQL for star schema</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">SELECT</span></span><br><span class="line">store.district_name,</span><br><span class="line">product.brand,</span><br><span class="line"><span class="keyword">sum</span>(sales_facts.sales_dollars) <span class="keyword">AS</span> <span class="string">"Sales Dollars"</span></span><br><span class="line"><span class="keyword">FROM</span></span><br><span class="line"><span class="keyword">store</span>,</span><br><span class="line">product,</span><br><span class="line"><span class="built_in">date</span>,</span><br><span class="line">sales_facts</span><br><span class="line"><span class="keyword">WHERE</span></span><br><span class="line">date.month_name=<span class="string">"January"</span> <span class="keyword">AND</span></span><br><span class="line">date.year=<span class="number">2013</span> <span class="keyword">AND</span></span><br><span class="line">store.store_key = sales_facts.store_key <span class="keyword">AND</span></span><br><span class="line">product.product_key = sales_facts.product_key <span class="keyword">AND</span></span><br><span class="line">date.date_key = sales_facts.date_key</span><br><span class="line"><span class="keyword">GROUP</span> <span class="keyword">BY</span></span><br><span class="line">store.district_name,</span><br><span class="line">product.brand</span><br></pre></td></tr></table></figure><p>Where clauses including filter then join between fact and dimention then group by  to estabsh the aggregation.</p></li></ul><p>P54.</p><h1 id="kimballs-dwbi-architecture"><a class="markdownIt-Anchor" href="#kimballs-dwbi-architecture"></a> Kimball’s DW/BI Architecture</h1><p>4 components: Operational Source Systems, ETL system, Data PRZ area, BI applications</p><h2 id="operational-source-systems"><a class="markdownIt-Anchor" href="#operational-source-systems"></a> Operational Source Systems</h2><ul><li>Focusing on : Performance and availability</li><li>Maintain little historical data</li></ul><h2 id="extract-transformation-and-load-system"><a class="markdownIt-Anchor" href="#extract-transformation-and-load-system"></a> Extract, Transformation, and Load System</h2><ul><li>Extraction: move the data into DW scope</li><li>Transformation: enrich, de-dup , etc</li><li>Load the data into dimensional model<ul><li>including Surrogate key assignment</li></ul></li></ul><blockquote><p>Industry argument, should ETL landing area be normalized structure? No need.</p></blockquote><h2 id="presentation-area-to-support-business-intelligence"><a class="markdownIt-Anchor" href="#presentation-area-to-support-business-intelligence"></a> Presentation Area to Support Business Intelligence</h2><ul><li>Baseline: data must be dimensional schema or OLAP cubes ; this has been accepted by industry</li><li>presentation area must contain atomic data (vs summary data );<ul><li>it’s <strong>unacceptable</strong> to put atomic data in to 3NF model and only put summary data into star schema (WRONG)</li></ul></li><li>Data area should be around process measurement; and across organizational dep boundaries.</li><li>When the bus architecture is used as a framework, you can develop the enterprise data warehouse in an agile, decentralized, realistically scoped, iterative manner.</li></ul><blockquote><p>Data in the queryable presentation area of the DW/BI system must be dimensional, atomic (complemented by performance-enhancing aggregates), business process-centric, and adhere to the enterprise data warehouse bus architecture.The data must not be structured according to individual departments’ interpretation of the data.</p></blockquote><h2 id="business-intelligence-applications"><a class="markdownIt-Anchor" href="#business-intelligence-applications"></a> Business Intelligence Applications</h2><ul><li>Tableau (??)</li></ul><p>P59</p><h2 id="restaurant-metaphor-for-the-kimball-architecture"><a class="markdownIt-Anchor" href="#restaurant-metaphor-for-the-kimball-architecture"></a> Restaurant Metaphor for the Kimball Architecture</h2><ul><li>ETL : Backend kitchen</li></ul><p>ETL should focusing on ,<br><strong>Quality</strong><br><strong>Consistency</strong><br><strong>Integrity</strong></p><p>ETL should avoid being involved by DW/BI patrons.</p><ul><li>Data Presentation and BI: Front Dining Room</li></ul><p>Focusing on : properly organized and utilized to deliver as needed to the presentation area’s food, decor, service, and cost.</p><h1 id="alternative-dwbi-architectures"><a class="markdownIt-Anchor" href="#alternative-dwbi-architectures"></a> Alternative DW/BI Architectures</h1><h2 id="independent-data-mart-architecture"><a class="markdownIt-Anchor" href="#independent-data-mart-architecture"></a> Independent Data Mart Architecture</h2><ul><li>Data after multiple ETL logic landed in multiple models designed for different front room.</li><li>No centralized data governance</li><li>Short term low cost; normally already applied star schema for each model</li></ul><h2 id="hub-and-spoke-corporate-information-factory-inmon-architecture"><a class="markdownIt-Anchor" href="#hub-and-spoke-corporate-information-factory-inmon-architecture"></a> Hub-and-Spoke Corporate Information Factory Inmon Architecture</h2><ul><li>3NF is re-enforced</li></ul><h2 id="hybrid-hub-and-spoke-and-kimball-architecture"><a class="markdownIt-Anchor" href="#hybrid-hub-and-spoke-and-kimball-architecture"></a> Hybrid Hub-and-Spoke and Kimball Architecture</h2><ul><li>2 Layers of ETL</li><li>Source -&gt; 3NF -&gt; Kimball</li></ul><p>P66</p><h1 id="dimensional-modeling-myths"><a class="markdownIt-Anchor" href="#dimensional-modeling-myths"></a> Dimensional Modeling Myths</h1><h2 id="myth-1-dimensional-only-for-summary-data"><a class="markdownIt-Anchor" href="#myth-1-dimensional-only-for-summary-data"></a> Myth 1: Dimensional only for summary Data</h2><p>Summary data should complement the granular details solely to provide improved performance for common queries, <strong>but not replace the details.</strong><br>The amount of history in dimensional models must only be driven by business’s requirement nor the performance purpose.</p><h2 id="myth-2-dimensional-for-departmental"><a class="markdownIt-Anchor" href="#myth-2-dimensional-for-departmental"></a> Myth 2: Dimensional for Departmental</h2><h2 id="myth-3-dimensional-are-not-scalable"><a class="markdownIt-Anchor" href="#myth-3-dimensional-are-not-scalable"></a> Myth 3: Dimensional are not scalable</h2><p>It’s common for fact table to have billions of rows; some fact table containing 2 trillion rows have been seen.<br>Key difference between 3NF and Dimensional is Dimensional are easier to understand.</p><h2 id="myth-4-dimensional-only-for-predictable-usage"><a class="markdownIt-Anchor" href="#myth-4-dimensional-only-for-predictable-usage"></a> Myth 4: Dimensional only for predictable usage</h2><p>The model is center on measurement process not pre-defined reports or analyses.<br>“God is in the details”</p><h2 id="myth-5-dimensional-cant-be-integrated"><a class="markdownIt-Anchor" href="#myth-5-dimensional-cant-be-integrated"></a> Myth 5: Dimensional Can’t be integrated</h2><p>Data integration depends on standardized labels, values, and definitions.</p><h1 id="more-reasons-to-think-dimensionally"><a class="markdownIt-Anchor" href="#more-reasons-to-think-dimensionally"></a> More Reasons to Think Dimensionally</h1><p>Robust dimensions translate into robust DW/BI systems.</p><h1 id="agile-considerations"><a class="markdownIt-Anchor" href="#agile-considerations"></a> Agile Considerations</h1><h1 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h1>]]></content>
      
      
      
        <tags>
            
            <tag> Datawarehouse </tag>
            
            <tag> BI </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>PostgreSQL</title>
      <link href="2019/06/01/markdown/BackToBasic/Postgres/Management/"/>
      <url>2019/06/01/markdown/BackToBasic/Postgres/Management/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/</a></p><p>CREATE ROLE readwrite;<br>GRANT CONNECT ON DATABASE “Datawarehouse” TO readwrite;<br>GRANT USAGE ON SCHEMA “dw_cons” TO readwrite;<br>GRANT USAGE, CREATE ON SCHEMA “dw_cons” TO readwrite;<br>GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA “dw_cons” TO readwrite;<br>ALTER DEFAULT PRIVILEGES IN SCHEMA “dw_cons” GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite;<br>GRANT USAGE ON ALL SEQUENCES IN SCHEMA “dw_cons” TO readwrite;<br>ALTER DEFAULT PRIVILEGES IN SCHEMA “dw_cons” GRANT USAGE ON SEQUENCES TO readwrite;</p><p>GRANT readonly TO “tableau_read”;<br>GRANT readwrite TO “tibco_write”;</p>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> PostgreSQL </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Google Cloud Study Jam</title>
      <link href="2019/05/14/markdown/Trending/Google/CloudStudyJam/"/>
      <url>2019/05/14/markdown/Trending/Google/CloudStudyJam/</url>
      
        <content type="html"><![CDATA[<p>gcloud ai-platform local predict <br>–model-dir output/export/census/1557796906 <br>–json-instances …/test.json</p><p>MODEL_BINARIES=$OUTPUT_PATH/export/census/1557797507/</p>]]></content>
      
      
      
        <tags>
            
            <tag> Jam </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Site to Site VPN</title>
      <link href="2019/05/11/markdown/AWS/AWS2018/Site2SiteVPN/"/>
      <url>2019/05/11/markdown/AWS/AWS2018/Site2SiteVPN/</url>
      
        <content type="html"><![CDATA[<h1 id="basic-steps"><a class="markdownIt-Anchor" href="#basic-steps"></a> Basic Steps</h1><h2 id="cloudformation"><a class="markdownIt-Anchor" href="#cloudformation"></a> Cloudformation</h2><ul><li>VPC with only private subnet; route table declared</li><li>VGW created and attached to VPC;</li><li>Propagation allowed via vgw to route table</li><li>CGW information declared;</li></ul><h2 id="create-site2sitevpn"><a class="markdownIt-Anchor" href="#create-site2sitevpn"></a> Create Site2SiteVPN</h2><ul><li><p>Pay attention to IPSec Tunnel Interconnection IP CIDR<br><a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-vpnconnection-vpntunneloptionsspecification.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-vpnconnection-vpntunneloptionsspecification.html</a></p></li><li><p>Download configuration and run from client side</p><ul><li>Pay attention to propagation CIDR</li></ul></li></ul><p>Client Side</p><ol><li>Confirm the Client Gateway support BGP</li><li>Allocate the IpSec tunnel interconnection ip cidr</li><li>Allocate AWS VPC IP range</li><li>Confirm Data Centre Propagating IP Rages (default will be 0.0.0.0)</li></ol>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Site2SiteVPN </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>EV3 Project</title>
      <link href="2019/03/10/markdown/Trending/EV3/Loading/"/>
      <url>2019/03/10/markdown/Trending/EV3/Loading/</url>
      
        <content type="html"><![CDATA[<h1 id="preparation"><a class="markdownIt-Anchor" href="#preparation"></a> Preparation</h1><p>Flash the machine<br><a href="https://sites.google.com/site/ev3devpython/setting-up-vs-code" target="_blank" rel="noopener">https://sites.google.com/site/ev3devpython/setting-up-vs-code</a></p><p>Connecting with mac<br><a href="https://www.ev3dev.org/docs/tutorials/connecting-to-ev3dev-with-ssh/" target="_blank" rel="noopener">https://www.ev3dev.org/docs/tutorials/connecting-to-ev3dev-with-ssh/</a></p><p>issues<br><a href="https://github.com/ev3dev/ev3dev/issues/1220" target="_blank" rel="noopener">https://github.com/ev3dev/ev3dev/issues/1220</a></p><p>Wireless</p>]]></content>
      
      
      
        <tags>
            
            <tag> EV3 </tag>
            
            <tag> Robotics </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Notes about SSO with Azure</title>
      <link href="2019/02/06/markdown/AWS/AWS2018/Azure_SSO_WithAWS/"/>
      <url>2019/02/06/markdown/AWS/AWS2018/Azure_SSO_WithAWS/</url>
      
        <content type="html"><![CDATA[<h1 id="update-single-azure-to-sso-to-multiple-aws"><a class="markdownIt-Anchor" href="#update-single-azure-to-sso-to-multiple-aws"></a> Update – Single Azure to SSO to multiple AWS</h1><ul><li>Identifier must be unique, it can be string</li></ul><h1 id="config-azure-ad-sso-to-aws-console-via-smal"><a class="markdownIt-Anchor" href="#config-azure-ad-sso-to-aws-console-via-smal"></a> Config Azure AD SSO to AWS Console via SMAL</h1><h2 id="azure-official-doc"><a class="markdownIt-Anchor" href="#azure-official-doc"></a> Azure Official Doc</h2><p><a href="https://docs.microsoft.com/en-us/azure/active-directory/saas-apps/amazon-web-service-tutorial" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/azure/active-directory/saas-apps/amazon-web-service-tutorial</a></p><h2 id="aditional-notes"><a class="markdownIt-Anchor" href="#aditional-notes"></a> Aditional Notes</h2><p>The config not align with above doc but needed when doing the config,</p><p>Example of claim key/values:</p><ul><li>name: emailaddress</li><li>Namespace: <a href="http://schemas.xmlsoap.org/ws/2005/05/identity/claims" target="_blank" rel="noopener">http://schemas.xmlsoap.org/ws/2005/05/identity/claims</a></li><li>Source: Attribute</li><li>Source attribute: user.mail</li></ul><p>Full config as below</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress</span><br><span class="line">user.mail</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname</span><br><span class="line">user.givenname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name</span><br><span class="line">user.userprincipalname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier</span><br><span class="line">user.userprincipalname</span><br><span class="line"></span><br><span class="line">http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname</span><br><span class="line">user.surname</span><br><span class="line"></span><br><span class="line">https://aws.amazon.com/SAML/Attributes/Role</span><br><span class="line">user.assignedroles</span><br><span class="line"></span><br><span class="line">https://aws.amazon.com/SAML/Attributes/RoleSessionName</span><br><span class="line">user.userprincipalname</span><br></pre></td></tr></table></figure><p>After successful config, login via<br><a href="https://account.activedirectory.windowsazure.com/r#/applications" target="_blank" rel="noopener">https://account.activedirectory.windowsazure.com/r#/applications</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Azure </tag>
            
            <tag> SSO </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - RDS MySQL</title>
      <link href="2018/08/24/markdown/AWS/AWS2018/09a_RDS_MySQL/"/>
      <url>2018/08/24/markdown/AWS/AWS2018/09a_RDS_MySQL/</url>
      
        <content type="html"><![CDATA[<h1 id="building-your-application-with-an-amazon-aurora-database-dem113"><a class="markdownIt-Anchor" href="#building-your-application-with-an-amazon-aurora-database-dem113"></a> Building Your Application with an Amazon Aurora Database (DEM113)</h1><p><a href="https://youtu.be/-ychuATbqPY" target="_blank" rel="noopener">https://youtu.be/-ychuATbqPY</a></p><h2 id="key-new-feature"><a class="markdownIt-Anchor" href="#key-new-feature"></a> Key New Feature</h2><ul><li>Serverless: Auto provision the computing power you need; scale up and down automatically.</li><li>Aurora parallel query<ul><li>An option when provision your DB, suitable for DB used for both transaction and analysis</li><li><a href="https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/</a></li><li>No extra cost but will be more expensive on IO</li></ul></li><li>Enable Backtrack (select the backup window)<ul><li>Be able to backtrack , extra cost 10USD/month</li></ul></li><li>Performance Insight<br>* by SQL by user(session)</li></ul><h1 id="running-a-high-performance-kubernetes-cluster-with-amazon-eks-con318-r1"><a class="markdownIt-Anchor" href="#running-a-high-performance-kubernetes-cluster-with-amazon-eks-con318-r1"></a> Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1)</h1><p><a href="https://youtu.be/YQWt6wdAZMU" target="_blank" rel="noopener">https://youtu.be/YQWt6wdAZMU</a></p><h2 id="optimize-pod-placement"><a class="markdownIt-Anchor" href="#optimize-pod-placement"></a> Optimize pod placement</h2><ol><li>limit the resource</li><li>Density vs. Size of pods</li><li>Anti-affinity : keep the CPU heavy pods onto different hosts</li></ol><h2 id="use-diagram-to-balance-the-design"><a class="markdownIt-Anchor" href="#use-diagram-to-balance-the-design"></a> Use diagram to balance the design</h2><ol><li>Anti-affinity</li><li>Secretes</li><li>Number of Nodes</li><li>Active Namespaces</li><li>Pod Churn</li><li>Pod Density</li><li>Networking</li></ol><h2 id="use-k8s-with-database"><a class="markdownIt-Anchor" href="#use-k8s-with-database"></a> Use K8S with Database</h2><p>When choosing the persistence layer you have 3 options, inside pod, outside but in same box, outside box.</p><p>37:12</p><h1 id="data-migration"><a class="markdownIt-Anchor" href="#data-migration"></a> Data migration</h1><ul><li>take backup from replica or slave</li><li>compress backup for transfer</li><li>use primary key sort order where possible</li><li>to speed up data loading : more memory + IOPS</li><li>disable binary logging and</li><li>change some of the configuration to reduce server writing logs to disk (because we are dumping the data, no issue of in-flight transaction)</li></ul><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> terminology</h2><p>binlog : transaction logs<br><a href="https://www.cnblogs.com/Cherie/p/3309503.html" target="_blank" rel="noopener">https://www.cnblogs.com/Cherie/p/3309503.html</a></p><p>Default is 0, V5.6 changed to 1, but not much impact the performance</p><h2 id="data-loading-format"><a class="markdownIt-Anchor" href="#data-loading-format"></a> Data loading format</h2><p>SQL:</p><ul><li>easy and simple</li><li>for small db<br>Flatfiles:</li><li>schema load</li><li>fault torlerance  (each file loading is a separate transaction)</li></ul><h1 id="normal-steps-to-migrate-database"><a class="markdownIt-Anchor" href="#normal-steps-to-migrate-database"></a> Normal steps to migrate database</h1><h2 id="from-on-premise-to-rds"><a class="markdownIt-Anchor" href="#from-on-premise-to-rds"></a> From on-premise to RDS</h2><ul><li>configure replication target and start replication</li><li>stop the application binding with origin source, stop replication after new target catches up</li><li>promote new target instance</li><li>change app binding pointing to new.</li></ul><h2 id="from-rds-to-on-premise"><a class="markdownIt-Anchor" href="#from-rds-to-on-premise"></a> From RDS to on-premise</h2><ul><li>RDS provide Point in time recovery</li></ul><h2 id="rds-data-to-redshift"><a class="markdownIt-Anchor" href="#rds-data-to-redshift"></a> RDS data to redshift</h2><ul><li>change the binlog config to “ROW”</li></ul><h1 id="multi-az-fail-over"><a class="markdownIt-Anchor" href="#multi-az-fail-over"></a> Multi-AZ Fail Over</h1><p>Around 1 min for fail over</p><ul><li>25 sec – detect failure</li><li>5 sec – promote standby</li><li>30 sec – CN Name (DNS) update</li><li>standby sits in different AZ, read replica sits in different region</li></ul><h1 id="important-scaling-archi"><a class="markdownIt-Anchor" href="#important-scaling-archi"></a> important scaling archi</h1><p>(???)</p><ul><li>for reading intensive application (for example 90% reads) — create more more read replica</li><li>for writes intensive (for example 20%) ---- 2 SCALE</li></ul><h1 id="rebooting-performance"><a class="markdownIt-Anchor" href="#rebooting-performance"></a> Rebooting performance</h1><p>If mysql is using InnoDB as engine, when rebooting, you can do cache warming to improve the performance. The feature is called CacheWarmer Turned down (cache before turning down)</p><h1 id="handle-schema-change"><a class="markdownIt-Anchor" href="#handle-schema-change"></a> Handle Schema Change</h1><ul><li>Option 1, promote standby approach</li><li>Option 2, use MySQL 5.6 new feature<ul><li>no blocked DML in most cases</li><li>Perfomance impact: data reorg(sometimes), cpu io , replica lag</li><li>45 min</li></ul></li><li>pt-online-schema-change tool<ul><li>Less performance impact , but longer (2 hours)</li><li>needs to start a EC2 and install the tools</li></ul></li></ul><h1 id="burst-mode"><a class="markdownIt-Anchor" href="#burst-mode"></a> Burst mode</h1><p>GP2 is designed to burst iops<br>T2 is designed to burst CPU</p><ul><li>The newer instance types with burst feature can save costs</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://youtu.be/ZQnzjhnDloM" target="_blank" rel="noopener">https://youtu.be/ZQnzjhnDloM</a></p><h1 id="difference-between-mysql-and-mariadb"><a class="markdownIt-Anchor" href="#difference-between-mysql-and-mariadb"></a> Difference between MySQL and MariaDB</h1><blockquote></blockquote><p><a href="https://blog.panoply.io/a-comparative-vmariadb-vs-mysql" target="_blank" rel="noopener">https://blog.panoply.io/a-comparative-vmariadb-vs-mysql</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS RDS </tag>
            
            <tag> MySQL </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Kinesis</title>
      <link href="2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/"/>
      <url>2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/</url>
      
        <content type="html"><![CDATA[<h1 id="kinesis-deepdive"><a class="markdownIt-Anchor" href="#kinesis-deepdive"></a> Kinesis Deepdive</h1><ul><li>No 1 popular scenario : moving small and fast moving data into persistent layer</li><li>No 2 popular scenario : Steaming data , NRT notification systems</li></ul><p>Kinesis:</p><ul><li>managed services</li><li>streaming data ingestion</li><li>continously processing</li></ul><p>Small , fast moving data, being captured quickly , then being consumed concurrently by multi different consumers for different analytics Purpose.</p><ul><li>You can split / merge shards via console</li></ul><h2 id="best-practises"><a class="markdownIt-Anchor" href="#best-practises"></a> best practises</h2><h3 id="partition-key-strategy"><a class="markdownIt-Anchor" href="#partition-key-strategy"></a> partition key strategy</h3><ul><li>Avoid hot shard<ul><li>use random partition key</li><li>use high cardinality key</li><li>use business key : per billing customer or per device id or per stock symbol</li></ul></li></ul><h3 id="provision-shards"><a class="markdownIt-Anchor" href="#provision-shards"></a> provision shards</h3><ul><li>provision enough shards</li><li>give some head-room in the event of application failures</li></ul><h3 id="put-data-into-kinesis"><a class="markdownIt-Anchor" href="#put-data-into-kinesis"></a> put data into Kinesis</h3><ul><li>do micro-batch before put</li><li>consider async producer by AWS SDK<ul><li>Kinesis-Log4j-Appender</li></ul></li><li><strong>provisionedThroughputExceeded Error</strong><ul><li>retry</li><li>re-shard</li><li>track &amp; monitor</li></ul></li><li>command to scale up</li></ul><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=myStream -Dscaling-action=scaleUp -Dcount=<span class="number">10</span> -Dregion=eu-west-<span class="number">1</span></span><br></pre></td></tr></table></figure><h3 id="ingest-data-from-kinesis"><a class="markdownIt-Anchor" href="#ingest-data-from-kinesis"></a> ingest data from kinesis</h3><ul><li>Amazon JDK<ul><li>one worker maps to one shard</li><li>libary to feed data into S3, DynamoDB , Redshift, Elastic Search.</li><li>feeding data following below pipeline,<ul><li>ITransformer: transform the data read from Kinesis</li><li>IFilter: filter only data interested</li><li>IBuffer: batching the data before sending out (for example to S3 or Redshift, better buffer to MB level before sending out)</li></ul></li><li>connector to redshift will put data into S3 first and buffer it then send to redshift</li></ul></li><li>application consuming the data better has the capability to scale automatically</li><li>use Matric to detect why the consumer is slow<ul><li>GetRecord.Latency</li></ul></li><li>build flush-to-S3 consumer to capture original data (by number; by byte ;by time)</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/8u9wIC1xNt8" target="_blank" rel="noopener">https://youtu.be/8u9wIC1xNt8</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Kinesis </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Redshift Deepdive</title>
      <link href="2018/07/29/markdown/AWS/AWS2018/023a_RedShift/"/>
      <url>2018/07/29/markdown/AWS/AWS2018/023a_RedShift/</url>
      
        <content type="html"><![CDATA[<h1 id="redshift-archi-overview"><a class="markdownIt-Anchor" href="#redshift-archi-overview"></a> Redshift Archi overview</h1><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/023_RedShiftClusterArchi.png?raw=true" alt="Redshift Cluster Archi "></p><ul><li>Bottom Layer: Ingestion Backup &amp; Restore layer</li><li>Leader Node &amp; Compute Node<ul><li>Leader node :</li></ul></li><li>Share Nothing MPP (Massive Parellel Processing) Architecture</li><li>Reduce IO<ul><li>Columnar Storage</li><li>Compress data  ( By Column)</li><li><strong>Zone Maps</strong> : in memory map about min and max value for given column in current block, to prune the query and reduce IO</li></ul></li><li><strong>Slices</strong><ul><li>depending on cpu cores, each node support different number of slices</li><li>unit of data partitioning / parallel processing</li><li>table rows are distributed into different slices</li></ul></li><li>Data Distribution :<ul><li>ALL; Key; Even(Round robin)</li></ul></li><li>Two types of hardwares as storage<ul><li>HDD is slower but can scale to petabytes (2PB); SSD is faster but can only support to 300+ TB</li></ul></li></ul><h2 id="storage-deep-dive"><a class="markdownIt-Anchor" href="#storage-deep-dive"></a> Storage Deep Dive</h2><ul><li>Advertised (pricing) storage is 1/3 of the true utilized storage, because 2/3 used to data copies.</li><li><strong>Blocks</strong> : column data persisted as 1MB immutable blocks.<ul><li>With zone map metadata</li><li>location of next block</li><li>can be compressed</li></ul></li><li>Small write has similiar cost with larger write(1~10 rows = 100k rows)</li><li>Update &amp; Delete will only trigger soft delete, use VACUUM or DEEP COPY to delete ghost rows</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/iuQgZDs-W7A" target="_blank" rel="noopener">https://youtu.be/iuQgZDs-W7A</a></p></blockquote><h1 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h1><ul><li>&lt;1k/TB/Year</li></ul><h2 id="data-ingestion"><a class="markdownIt-Anchor" href="#data-ingestion"></a> Data Ingestion</h2><ul><li><p>Ingestion Source: SSH, S3, EMR, DynamoDB</p></li><li><p>for COPY command, one slice can only single thread one COPY command.</p><ul><li>To get 100M/s , you need multiple slices and multiple nodes</li><li>Batch inserts will save commit cost</li><li>If you have 16 slices, use 16 concurrent copy commands to 16 files to maximize performance</li><li>During COPY Redshift don’t enforce primary key</li><li>Provide manifest file in json format on S3 while copying from S3 to make sure the load Behaviors are as expected.</li></ul></li><li><p>Redshift will appy Query Optimizer but how the optimize depends on statistics</p><ul><li>COPY will do statistics automatically</li></ul></li><li><p>Redshift Data Compression</p><ul><li>COPY will do compression automatically and select encoding automatially</li></ul></li><li><p>Data Hygiene</p><ul><li>Analysis regularly (sort every week)</li><li>Vacuum regularly (weekly)</li><li>Use SVV_Table_Info</li></ul></li><li><p>Automatic Compression</p><ul><li>Don’t compress sort keys<ul><li>If might result in you scan more rows than you needed ( many rows in one block by compression )</li></ul></li></ul></li><li><p>Varchar column (define as small as possible)</p><ul><li>the more varchar waste the memeory the less rows being loaded in memeory to do query (spilled into disk)</li></ul></li><li><p>Compound Sort Keys</p></li><li><p>Don’t Forklift</p></li><li><p>On redshift :</p><ul><li>Update = delete + insert</li><li>Commits are expensive ; blocks are immutable (1mb) – load 1k rows a time</li><li>no small commit</li><li>Concurrency should be low for better throughput</li></ul></li><li><p>between redshift and dashboard, add a cache layer</p></li><li><p><strong>Work Load Management</strong></p></li></ul><h2 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h2><ul><li>Source Data from S3 – Use Envolope Encryption</li><li>Encrypt data at rest<ul><li>enable when create the cluster</li><li>Hardware acceleration (HSM)</li><li>~20% performance impact</li><li>4 layers of keys: block;database;cluster; master<ul><li>Benefit: key rotate means use new key to encrypt the upper level key, not re-encrypt the whole data</li></ul></li></ul></li><li>Encrypt data with certain column to restrict view to certain customer</li><li>Support automatically encrypt Unload data (unload data from redshift to S3 files)</li></ul><h2 id="udf-user-defined-functions"><a class="markdownIt-Anchor" href="#udf-user-defined-functions"></a> UDF – User Defined functions</h2><ul><li>Use Python to write UDF</li><li>Aggregate UDF<ul><li>you need to implement ini function , aggregation function and finalize function</li></ul></li></ul><h2 id="multi-demintional-indexing-with-space-filling-curves"><a class="markdownIt-Anchor" href="#multi-demintional-indexing-with-space-filling-curves"></a> Multi-demintional indexing with space filling curves</h2><ul><li>When data started to grow, you started to have<ul><li><strong>zone Maps</strong> : stores min max value of a block in memory</li><li>Sorting</li><li>Projection : mutiple copies of data sorted using different ways</li></ul></li><li>new keyword to index <strong>INTERLEAVED</strong></li></ul><h2 id="user-reference"><a class="markdownIt-Anchor" href="#user-reference"></a> User reference</h2><ul><li>automation framework : Azakaban (LinkedIn)</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/fmy3jCxUliM" target="_blank" rel="noopener">https://youtu.be/fmy3jCxUliM</a></p></blockquote><blockquote><p>Deepdive 2014<br><a href="https://youtu.be/K-Usisr0zwg" target="_blank" rel="noopener">https://youtu.be/K-Usisr0zwg</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Redshift </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Buzz Words</title>
      <link href="2018/07/25/markdown/BackToBasic/buzzwords/"/>
      <url>2018/07/25/markdown/BackToBasic/buzzwords/</url>
      
        <content type="html"><![CDATA[<h1 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h1><p>Symmetric vs Asymmetric encryption</p><h1 id="blockchain"><a class="markdownIt-Anchor" href="#blockchain"></a> Blockchain</h1><ul><li>cryptographically verifiable</li></ul><h1 id="security-2"><a class="markdownIt-Anchor" href="#security-2"></a> Security</h1><ul><li>BlastRadius</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> buzz words </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Tibco -- BusinessWorks</title>
      <link href="2018/07/20/markdown/TechByVendorName/Tibco/BuildImage/"/>
      <url>2018/07/20/markdown/TechByVendorName/Tibco/BuildImage/</url>
      
        <content type="html"><![CDATA[<p>Try to create a docker image – used to build ear.</p><figure class="highlight docker"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># use same version of ubuntu</span></span><br><span class="line"><span class="comment"># simulate the prod environment</span></span><br><span class="line"><span class="keyword">FROM</span> ubuntu:latest</span><br><span class="line"><span class="keyword">COPY</span><span class="bash"> ./TIB_BW_6.4.2_linux26gl23_x86_64.zip /installtb</span></span><br><span class="line"><span class="keyword">COPY</span><span class="bash"> ./TIB_bwpluginftl_6.4.1_linux26gl23_x86_64.zip /installplugin</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> <span class="built_in">cd</span> /installtb</span></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> unzip TIB_BW_6.4.2_linux26gl23_x86_64.zip</span></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> ./TIBCOUniversalInstaller-lnx-x86-64.bin -silent -V responseFile=<span class="string">'TIBCOUniversalInstaller_BW_6.4.2.silent'</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> <span class="built_in">cd</span> /installplugin</span></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> unzip TIB_bwpluginftl_6.4.1_linux26gl23_x86_64.zip</span></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> ./TIBCOUniversalInstaller-lnx-x86-64.bin -silent -V responseFile=<span class="string">'TIBCOUniversalInstaller_bwpluginftl_6.4.1.silent'</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> rm -rf /installplugin</span></span><br><span class="line"><span class="keyword">CMD</span><span class="bash"> rm -rf /installtb</span></span><br></pre></td></tr></table></figure><p>docker build -t liuruibnu/bw641:v1 .<br>docker run -it liuruibnu/bw641:v1 ls /opt/tibco<br>docker run -it liuruibnu/bw641:v1 ls ~/.TIBCO/</p>]]></content>
      
      
      
        <tags>
            
            <tag> Tibco </tag>
            
            <tag> BusinessWorks </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Hibrid Architecture</title>
      <link href="2018/07/18/markdown/AWS/AWS2018/BestPractises_Hybrid/"/>
      <url>2018/07/18/markdown/AWS/AWS2018/BestPractises_Hybrid/</url>
      
        <content type="html"><![CDATA[<h1 id="customer-case"><a class="markdownIt-Anchor" href="#customer-case"></a> Customer Case</h1><ul><li>Bring efficiency with deployment</li><li>Single provider with capability in all regions</li><li>Take hot sql server dumps and put into S3</li><li>Issue with Oracle RAC with AWS</li><li>All the environment must be PCI compliant (Payment Card Industry Data Security Standard )</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/x-DynRJUugU" target="_blank" rel="noopener">https://youtu.be/x-DynRJUugU</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Best practise </tag>
            
            <tag> Hybrid Architecture </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - API Gateway</title>
      <link href="2018/07/09/markdown/AWS/AWS2018/Extra_AWS_API_Gateway/"/>
      <url>2018/07/09/markdown/AWS/AWS2018/Extra_AWS_API_Gateway/</url>
      
        <content type="html"><![CDATA[<h1 id="api-gateway-and-lambda"><a class="markdownIt-Anchor" href="#api-gateway-and-lambda"></a> API Gateway and Lambda</h1><ul><li>Make user of IAM to manage security</li><li>Swagger import and client sdk – we can automate most workflows</li><li>Deloyment of API is done by Swagger</li></ul><h1 id="key-feature-of-api-gateway"><a class="markdownIt-Anchor" href="#key-feature-of-api-gateway"></a> Key feature of API Gateway</h1><ol><li>define host different versions of APIs</li><li>manage network traffic</li><li>auth</li></ol><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote></blockquote><p><a href="https://youtu.be/ZBxWZ9bgd44" target="_blank" rel="noopener">https://youtu.be/ZBxWZ9bgd44</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS API Gateway </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Docker command cheet sheet</title>
      <link href="2018/07/08/markdown/Trending/Docker/docker-cmdCheetsheet/"/>
      <url>2018/07/08/markdown/Trending/Docker/docker-cmdCheetsheet/</url>
      
        <content type="html"><![CDATA[<h1 id="build-a-image"><a class="markdownIt-Anchor" href="#build-a-image"></a> build a image</h1><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">docker build -t account/repo:tag .</span><br><span class="line"><span class="meta">#</span><span class="bash">  specify the build file</span></span><br><span class="line">docker build -t account/repo:tag --file dockerfilename .</span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash">debug generated image</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> create a volumn</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> docker run -it -v <span class="string">"<span class="variable">$PWD</span>"</span>:/usr/src/mymaven -v <span class="string">"<span class="variable">$HOME</span>/.m2"</span>:/root/.m2 -v <span class="string">"<span class="variable">$PWD</span>/target:/usr/src/mymaven/target"</span> -w /usr/src/mymaven maven mvn clean package  </span></span><br><span class="line"></span><br><span class="line">docker volume create --name workspace</span><br><span class="line"></span><br><span class="line">docker run -it -v workspace:/workspace maven mvn archetype:generate # will download artifacts</span><br><span class="line">docker run -it -v workspace:/workspace maven mvn archetype:generate # will reuse downloaded artifacts</span><br><span class="line">docker run etpartner/tibco:bw642ftl ls /opt/tibco/bw/bw/6.4/bin/bwdesign</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">docker cloud</span><br><span class="line">creat repo ; then</span><br><span class="line">docker login</span><br><span class="line">docker push</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> docker </tag>
            
            <tag> Cheetsheet </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Cloudformation for EC2</title>
      <link href="2018/07/08/markdown/AWS/AWS2018/Cloudformation/EC2_snippets/"/>
      <url>2018/07/08/markdown/AWS/AWS2018/Cloudformation/EC2_snippets/</url>
      
        <content type="html"><![CDATA[<p>ssh -i /path/my-key-pair.pem <a href="mailto:ec2-user@ec2-198-51-100-1.compute-1.amazonaws.com" target="_blank" rel="noopener">ec2-user@ec2-198-51-100-1.compute-1.amazonaws.com</a></p><h1 id="cloudformation-for-ec2"><a class="markdownIt-Anchor" href="#cloudformation-for-ec2"></a> Cloudformation for EC2</h1><p><a href="https://github.com/awslabs/aws-cloudformation-templates/blob/master/aws/services/EC2/EC2InstanceWithSecurityGroupSample.yaml" target="_blank" rel="noopener">https://github.com/awslabs/aws-cloudformation-templates/blob/master/aws/services/EC2/EC2InstanceWithSecurityGroupSample.yaml</a></p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">AWSTemplateFormatVersion:</span> <span class="string">'2010-09-09'</span></span><br><span class="line"><span class="attr">Metadata:</span></span><br><span class="line">  <span class="attr">License:</span> <span class="string">Apache-2.0</span></span><br><span class="line"><span class="attr">Description:</span> <span class="string">'AWS CloudFormation Sample Template EC2InstanceWithSecurityGroupSample:</span></span><br><span class="line"><span class="string">  Create an Amazon EC2 instance running the Amazon Linux AMI. The AMI is chosen based</span></span><br><span class="line"><span class="string">  on the region in which the stack is run. This example creates an EC2 security group</span></span><br><span class="line"><span class="string">  for the instance to give you SSH access. **WARNING** This template creates an Amazon</span></span><br><span class="line"><span class="string">  EC2 instance. You will be billed for the AWS resources used if you create a stack</span></span><br><span class="line"><span class="string">  from this template.'</span></span><br><span class="line"><span class="attr">Parameters:</span></span><br><span class="line">  <span class="attr">KeyName:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">Name</span> <span class="string">of</span> <span class="string">an</span> <span class="string">existing</span> <span class="string">EC2</span> <span class="string">KeyPair</span> <span class="string">to</span> <span class="string">enable</span> <span class="string">SSH</span> <span class="string">access</span> <span class="string">to</span> <span class="string">the</span> <span class="string">instance</span></span><br><span class="line">    <span class="attr">Type:</span> <span class="string">AWS::EC2::KeyPair::KeyName</span></span><br><span class="line">    <span class="attr">ConstraintDescription:</span> <span class="string">must</span> <span class="string">be</span> <span class="string">the</span> <span class="string">name</span> <span class="string">of</span> <span class="string">an</span> <span class="string">existing</span> <span class="string">EC2</span> <span class="string">KeyPair.</span></span><br><span class="line">  <span class="attr">InstanceType:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">WebServer</span> <span class="string">EC2</span> <span class="string">instance</span> <span class="string">type</span></span><br><span class="line">    <span class="attr">Type:</span> <span class="string">String</span></span><br><span class="line">    <span class="attr">Default:</span> <span class="string">t2.small</span></span><br><span class="line">    <span class="attr">AllowedValues:</span> <span class="string">[t1.micro]</span></span><br><span class="line">    <span class="attr">ConstraintDescription:</span> <span class="string">must</span> <span class="string">be</span> <span class="string">a</span> <span class="string">valid</span> <span class="string">EC2</span> <span class="string">instance</span> <span class="string">type.</span></span><br><span class="line">  <span class="attr">SSHLocation:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">The</span> <span class="string">IP</span> <span class="string">address</span> <span class="string">range</span> <span class="string">that</span> <span class="string">can</span> <span class="string">be</span> <span class="string">used</span> <span class="string">to</span> <span class="string">SSH</span> <span class="string">to</span> <span class="string">the</span> <span class="string">EC2</span> <span class="string">instances</span></span><br><span class="line">    <span class="attr">Type:</span> <span class="string">String</span></span><br><span class="line">    <span class="attr">MinLength:</span> <span class="number">9</span></span><br><span class="line">    <span class="attr">MaxLength:</span> <span class="number">18</span></span><br><span class="line">    <span class="attr">Default:</span> <span class="number">0.0</span><span class="number">.0</span><span class="number">.0</span><span class="string">/0</span></span><br><span class="line">    <span class="attr">AllowedPattern:</span> <span class="string">(\d&#123;1,3&#125;)\.(\d&#123;1,3&#125;)\.(\d&#123;1,3&#125;)\.(\d&#123;1,3&#125;)/(\d&#123;1,2&#125;)</span></span><br><span class="line">    <span class="attr">ConstraintDescription:</span> <span class="string">must</span> <span class="string">be</span> <span class="string">a</span> <span class="string">valid</span> <span class="string">IP</span> <span class="string">CIDR</span> <span class="string">range</span> <span class="string">of</span> <span class="string">the</span> <span class="string">form</span> <span class="string">x.x.x.x/x.</span></span><br><span class="line"><span class="attr">Mappings:</span></span><br><span class="line">  <span class="attr">AWSInstanceType2Arch:</span></span><br><span class="line">    <span class="attr">t1.micro:</span></span><br><span class="line">      <span class="attr">Arch:</span> <span class="string">PV64</span></span><br><span class="line">  <span class="attr">AWSInstanceType2NATArch:</span></span><br><span class="line">    <span class="attr">t1.micro:</span></span><br><span class="line">      <span class="attr">Arch:</span> <span class="string">NATPV64</span></span><br><span class="line">  <span class="attr">AWSRegionArch2AMI:</span></span><br><span class="line">    <span class="attr">us-east-1:</span></span><br><span class="line">      <span class="attr">PV64:</span> <span class="string">ami-2a69aa47</span></span><br><span class="line">      <span class="attr">HVM64:</span> <span class="string">ami-6869aa05</span></span><br><span class="line">      <span class="attr">HVMG2:</span> <span class="string">ami-50b4f047</span></span><br><span class="line">    <span class="attr">us-east-2:</span></span><br><span class="line">      <span class="attr">PV64:</span> <span class="string">NOT_SUPPORTED</span></span><br><span class="line">      <span class="attr">HVM64:</span> <span class="string">ami-f6035893</span></span><br><span class="line">      <span class="attr">HVMG2:</span> <span class="string">NOT_SUPPORTED</span></span><br><span class="line"><span class="attr">Resources:</span></span><br><span class="line">  <span class="attr">EC2Instance:</span></span><br><span class="line">    <span class="attr">Type:</span> <span class="string">AWS::EC2::Instance</span></span><br><span class="line">    <span class="attr">Properties:</span></span><br><span class="line">      <span class="attr">InstanceType:</span> <span class="type">!Ref</span> <span class="string">'InstanceType'</span></span><br><span class="line">      <span class="attr">SecurityGroups:</span> <span class="string">[!Ref</span> <span class="string">'InstanceSecurityGroup'</span><span class="string">]</span></span><br><span class="line">      <span class="attr">KeyName:</span> <span class="type">!Ref</span> <span class="string">'KeyName'</span></span><br><span class="line">      <span class="attr">ImageId:</span> <span class="type">!FindInMap</span> <span class="string">[AWSRegionArch2AMI,</span> <span class="type">!Ref</span> <span class="string">'AWS::Region'</span><span class="string">,</span> <span class="type">!FindInMap</span> <span class="string">[AWSInstanceType2Arch,</span></span><br><span class="line">          <span class="type">!Ref</span> <span class="string">'InstanceType'</span><span class="string">,</span> <span class="string">Arch]]</span></span><br><span class="line">  <span class="attr">InstanceSecurityGroup:</span></span><br><span class="line">    <span class="attr">Type:</span> <span class="string">AWS::EC2::SecurityGroup</span></span><br><span class="line">    <span class="attr">Properties:</span></span><br><span class="line">      <span class="attr">GroupDescription:</span> <span class="string">Enable</span> <span class="string">SSH</span> <span class="string">access</span> <span class="string">via</span> <span class="string">port</span> <span class="number">22</span></span><br><span class="line">      <span class="attr">SecurityGroupIngress:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">IpProtocol:</span> <span class="string">tcp</span></span><br><span class="line">        <span class="attr">FromPort:</span> <span class="number">22</span></span><br><span class="line">        <span class="attr">ToPort:</span> <span class="number">22</span></span><br><span class="line">        <span class="attr">CidrIp:</span> <span class="type">!Ref</span> <span class="string">'SSHLocation'</span></span><br><span class="line"><span class="attr">Outputs:</span></span><br><span class="line">  <span class="attr">InstanceId:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">InstanceId</span> <span class="string">of</span> <span class="string">the</span> <span class="string">newly</span> <span class="string">created</span> <span class="string">EC2</span> <span class="string">instance</span></span><br><span class="line">    <span class="attr">Value:</span> <span class="type">!Ref</span> <span class="string">'EC2Instance'</span></span><br><span class="line">  <span class="attr">AZ:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">Availability</span> <span class="string">Zone</span> <span class="string">of</span> <span class="string">the</span> <span class="string">newly</span> <span class="string">created</span> <span class="string">EC2</span> <span class="string">instance</span></span><br><span class="line">    <span class="attr">Value:</span> <span class="type">!GetAtt</span> <span class="string">[EC2Instance,</span> <span class="string">AvailabilityZone]</span></span><br><span class="line">  <span class="attr">PublicDNS:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">Public</span> <span class="string">DNSName</span> <span class="string">of</span> <span class="string">the</span> <span class="string">newly</span> <span class="string">created</span> <span class="string">EC2</span> <span class="string">instance</span></span><br><span class="line">    <span class="attr">Value:</span> <span class="type">!GetAtt</span> <span class="string">[EC2Instance,</span> <span class="string">PublicDnsName]</span></span><br><span class="line">  <span class="attr">PublicIP:</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">Public</span> <span class="string">IP</span> <span class="string">address</span> <span class="string">of</span> <span class="string">the</span> <span class="string">newly</span> <span class="string">created</span> <span class="string">EC2</span> <span class="string">instance</span></span><br><span class="line">    <span class="attr">Value:</span> <span class="type">!GetAtt</span> <span class="string">[EC2Instance,</span> <span class="string">PublicIp]</span></span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> EC2 </tag>
            
            <tag> CloudFormation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Cloudformation for S3</title>
      <link href="2018/07/08/markdown/AWS/AWS2018/Cloudformation/S3_snippets/"/>
      <url>2018/07/08/markdown/AWS/AWS2018/Cloudformation/S3_snippets/</url>
      
        <content type="html"><![CDATA[<h1 id="template-for-s3-bitbucket"><a class="markdownIt-Anchor" href="#template-for-s3-bitbucket"></a> Template for S3 bitbucket</h1><ul><li>Delete the stack will delete the S3 bucket</li></ul><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">AWSTemplateFormatVersion:</span> <span class="string">'2010-09-09'</span></span><br><span class="line"><span class="attr">Description:</span> <span class="string">"Sample CloudFormation Template: this template will define a statck with S3 bucket"</span></span><br><span class="line"><span class="attr">Resources:</span></span><br><span class="line">   <span class="attr">MyS3Bucket:</span></span><br><span class="line">       <span class="attr">Type:</span> <span class="string">AWS::S3::Bucket</span></span><br><span class="line">       <span class="attr">Properties:</span></span><br><span class="line">           <span class="attr">AccessControl:</span> <span class="string">PublicRead</span></span><br><span class="line">           <span class="attr">Tags:</span></span><br><span class="line">             <span class="bullet">-</span></span><br><span class="line">               <span class="attr">Key:</span> <span class="string">"S3BucketName"</span></span><br><span class="line">               <span class="attr">Value:</span> <span class="string">"Dev"</span>             </span><br><span class="line"><span class="attr">Outputs:</span></span><br><span class="line">  <span class="attr">BucketName:</span></span><br><span class="line">    <span class="attr">Value:</span> <span class="type">!Ref</span> <span class="string">'MyS3Bucket'</span></span><br><span class="line">    <span class="attr">Description:</span> <span class="string">Name</span> <span class="string">of</span> <span class="string">the</span> <span class="string">sample</span> <span class="string">Amazon</span> <span class="string">S3</span> <span class="string">bucket</span> <span class="string">with</span> <span class="string">a</span> <span class="string">notification</span> <span class="string">configuration.</span></span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> S3 </tag>
            
            <tag> CloudFormation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudSearch</title>
      <link href="2018/07/02/markdown/AWS/AWS2018/Extra_AWS_CloudSearch/"/>
      <url>2018/07/02/markdown/AWS/AWS2018/Extra_AWS_CloudSearch/</url>
      
        <content type="html"><![CDATA[<h1 id="aws-cloudsearch"><a class="markdownIt-Anchor" href="#aws-cloudsearch"></a> AWS CloudSearch</h1><h2 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h2><p>Key difference between db query and search engine:<br>DB query will give you exact result, search engine only gives you best result.</p><h2 id="deepdive"><a class="markdownIt-Anchor" href="#deepdive"></a> DeepDive</h2><ul><li>Core based on Apache Solr</li><li>Key info with deepdive<ul><li>setting up</li><li>Scaling</li><li>Queyring</li><li>Architecting</li></ul></li></ul><h2 id="setting-up"><a class="markdownIt-Anchor" href="#setting-up"></a> Setting up</h2><ul><li><p>Create and config a domain ( cli )</p></li><li><p>Create batches ( go to data store, change data format to Solr supported type )</p><ul><li>Use maximum batches (5M bytes) – use max sized batches</li><li>conver data: remove bad charactors</li></ul></li><li><p>integrate with IAM (who can connect to which domain)</p></li><li><p>integrate with cloudtrail</p></li></ul><h2 id="scaling"><a class="markdownIt-Anchor" href="#scaling"></a> Scaling</h2><ul><li>tip: increase instance type for load-in<ul><li>Test against different type of data (tweets vs web data)</li><li>Options will have effect on index size</li><li>muti-threads to upload data ( test the limites to avoid 500 error)</li></ul></li><li>set multi partition</li><li>Pre-warm for traffic spike</li></ul><h2 id="query"><a class="markdownIt-Anchor" href="#query"></a> Query</h2><ul><li>Rest api</li><li>Geo Filtering</li><li>fq ( used to filter result) vs q</li><li>Geo sorting ( sort by distance)</li><li>Boosting (certain key word has higher ranking)</li><li>Multi-language supported.</li><li>AWS SDK</li></ul><h2 id="architecting"><a class="markdownIt-Anchor" href="#architecting"></a> Architecting</h2><ul><li>Cache , add elastic cache before cloud engine</li><li>Consider Muti tenant<ul><li>Option1,<ul><li>feed all data into same domain</li><li>use a field with customer id</li></ul></li><li>Option2,<ul><li>multi domain</li></ul></li><li>how to choose:<ul><li>if each customer has very different config, then use multi domain</li><li>if each customer has similiar config, but we have a lot of different customers, then set up a lot of domain is not cost effective.</li></ul></li></ul></li><li>Mine user behavior to improve result ( user search result log into EMR , analysis result feed back in as search parameters to help imporve search result)<ul><li>help with document boosting; augmentation; synonym creation</li></ul></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>AWS CloudSearch DeepDive and Best Practices<br><a href="https://youtu.be/OeHaj1a66I4" target="_blank" rel="noopener">https://youtu.be/OeHaj1a66I4</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CloudSearch </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Workspace</title>
      <link href="2018/07/02/markdown/AWS/AWS2018/Extra_AWS_Workspace/"/>
      <url>2018/07/02/markdown/AWS/AWS2018/Extra_AWS_Workspace/</url>
      
        <content type="html"><![CDATA[<h1 id="aws-workspaces-and-workspaces-application-manager"><a class="markdownIt-Anchor" href="#aws-workspaces-and-workspaces-application-manager"></a> AWS WorkSpaces and WorkSpaces Application Manager</h1><ul><li>One user one VM; data based on EBS</li><li>AWS workspaces is for desktops; AWS EC2 is for servers</li><li>Integrate with existing tools: AD;Intranet; MFA; SCCM (System Center Configuration Manager)</li><li>Work well with a lot of patterns<ul><li>BYOD (bring your own devices)</li></ul></li></ul><h2 id="updates"><a class="markdownIt-Anchor" href="#updates"></a> Updates</h2><ul><li>For standard bundle, no upgrade cost</li><li>support BYOL (bring your won lisence)</li><li>Volumn Encryption with AWS KMS</li><li>FMA (Radius )<ul><li>Remote Authentication Dial-In User Service (RADIUS) is a networking protocol, operating on port 1812 that provides centralized Authentication, Authorization</li></ul></li><li>Certification - SOC 1,2 ISO9001 and 27001</li></ul><h2 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h2><ul><li>after setting up, connect to existin AD</li><li>Using wizard to launch workspace<ul><li>create user in AD</li><li>select bundle ( laptop with certain images )</li><li>wait 20 min till workspace fully launched</li></ul></li><li>User use workspace client to connect to the server</li></ul><h1 id="wam-workspaces-application-manager"><a class="markdownIt-Anchor" href="#wam-workspaces-application-manager"></a> WAM (workspaces Application manager)</h1><ul><li>Deploy track and update apps on user’s workspaces</li><li>bring your won apss or subscribe apps from aws marketpalce</li><li>Gain availabity and control over app usage</li><li>support app versioning</li></ul><h2 id="demo-2"><a class="markdownIt-Anchor" href="#demo-2"></a> Demo</h2><ul><li>WAM has a catelog containing all apps the current account owns</li><li>select muti apps from catelog and assign to AD user or groups</li><li>during assign wizard, configure options allows<ul><li>assign certain version of the apps</li><li>installation type (optional or required);</li><li>auto install or optional install</li></ul></li><li>uninstall from WAM<ul><li>by removing the user from app subscription</li></ul></li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote></blockquote><p><a href="https://youtu.be/uE4x5kWVox4" target="_blank" rel="noopener">https://youtu.be/uE4x5kWVox4</a></p><h1 id="workspaces-best-practises"><a class="markdownIt-Anchor" href="#workspaces-best-practises"></a> Workspaces Best Practises</h1><h2 id="background"><a class="markdownIt-Anchor" href="#background"></a> background</h2><ul><li>2000 users</li><li>have aws direct connect</li></ul><h2 id="aws-account-structure"><a class="markdownIt-Anchor" href="#aws-account-structure"></a> AWS Account Structure</h2><ul><li>Payer/Linked Account structure</li><li>Only cerntral logging in Payer Account</li><li>WOrkSpaces in separate account</li><li>Consistent tagging standards across all acounts</li><li>Set up IAM access to allow L1 helpdesk to reboot the workspace</li><li>Use a dedicated AWS account for user management</li></ul><h2 id="network-deployment-considerations"><a class="markdownIt-Anchor" href="#network-deployment-considerations"></a> Network Deployment considerations</h2><h3 id="vpc-design-best-practise"><a class="markdownIt-Anchor" href="#vpc-design-best-practise"></a> VPC design best practise</h3><ul><li><p>Rule 1: Eliminate IP waste (be frugal with what you use)</p></li><li><p>Rule 2: Minimum 2 subnets</p></li><li><p>Rule 3: Be flexible to accommodate for future.</p></li><li><p>Based on 2k users, we design /20 and get 4k ip addresses</p></li><li><p>use subnet to isolate DEV/PROD env</p></li><li><p>Set up cross-account VPC peering. (AWS CLI)</p></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/Extra_AWS_Workspace_Diagram.png?raw=true" alt="aws workspace network topo"></p><h2 id="directory-services-design-consideration"><a class="markdownIt-Anchor" href="#directory-services-design-consideration"></a> Directory Services Design Consideration</h2><ul><li>AWS AD instances sits in management account hosting AD</li><li>Use VPC peering to peer to another account which hosting the VPC with workspace instances running (and with AD connector deployed).</li><li>Set up AD sites and services ( avoid routing back to on-promise to authenticate) — setting inside AD (add region name in site naming convention)</li></ul><h2 id="demo-with-configuration"><a class="markdownIt-Anchor" href="#demo-with-configuration"></a> Demo with configuration</h2><ul><li>Enable / Disable MFA<ul><li>Specify the RADIUS server;</li><li>Provision muti RADIUS for HA purpose; Can host RADIUS on EC2; Implement multi ADC to archive support for multi RADIUS standards</li></ul></li><li>Setting to connect with AD</li><li>Local Admininistrator enable/ disable</li><li>Use AD SItes and Services to correct opration of Directory Service</li><li>Sparated workspace OU to apply group policy</li><li>Check ingress ports are open for Directory Service communications</li><li>Use AD Group Policy to manage workspace</li><li>Use AWS Cli to spin up workspace</li></ul><h1 id="references-2"><a class="markdownIt-Anchor" href="#references-2"></a> References</h1><blockquote></blockquote><p><a href="https://youtu.be/9Q-ahnw2Lsc" target="_blank" rel="noopener">https://youtu.be/9Q-ahnw2Lsc</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Workspace </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Tibco -- FTL</title>
      <link href="2018/07/02/markdown/TechByVendorName/Tibco/FTL/"/>
      <url>2018/07/02/markdown/TechByVendorName/Tibco/FTL/</url>
      
        <content type="html"><![CDATA[<h1 id="tibco-ftl-community-edition"><a class="markdownIt-Anchor" href="#tibco-ftl-community-edition"></a> Tibco FTL Community Edition</h1><p>Default installation location on MAC.<br>The package was installed into /opt/tibco/ftl/5.4</p><h1 id="start-a-realm-server"><a class="markdownIt-Anchor" href="#start-a-realm-server"></a> start a realm server</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo /opt/tibco/ftl/current-version/bin/tibrealmserver</span><br></pre></td></tr></table></figure><h1 id="start-developing-java-client"><a class="markdownIt-Anchor" href="#start-developing-java-client"></a> start developing java client</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"># install client jar into maven</span><br><span class="line">mvn install:install-file -Dfile=/opt/tibco/ftl/current-version/lib/tibftl.jar -DgroupId=tibco.ftl.client \</span><br><span class="line">    -DartifactId=ftlclient -Dversion=5.4 -Dpackaging=jar</span><br><span class="line"></span><br><span class="line"># For code using  com.tibco.ftl.group.*;</span><br><span class="line">mvn install:install-file -Dfile=/opt/tibco/ftl/current-version/lib/tibftlgroup.jar -DgroupId=tibco.ftl.client \</span><br><span class="line">    -DartifactId=ftlgroupclient -Dversion=5.4 -Dpackaging=jar</span><br></pre></td></tr></table></figure><p>When running java , using -Djava.library.path=/opt/tibco/ftl/current-version/lib</p><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>Doc home<br><a href="https://docs.tibco.com/pub/ftl/5.4.0/doc/html/GUID-C0EBAF36-A682-445D-B3EE-8E683330BE07-homepage.html" target="_blank" rel="noopener">https://docs.tibco.com/pub/ftl/5.4.0/doc/html/GUID-C0EBAF36-A682-445D-B3EE-8E683330BE07-homepage.html</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> Tibco </tag>
            
            <tag> Tibco FTL </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Config</title>
      <link href="2018/06/29/markdown/AWS/AWS2018/Extra_AWSConfig/"/>
      <url>2018/06/29/markdown/AWS/AWS2018/Extra_AWSConfig/</url>
      
        <content type="html"><![CDATA[<h1 id="aws-config"><a class="markdownIt-Anchor" href="#aws-config"></a> AWS Config</h1><p>Like the Ambari console config change</p><h2 id="key-features"><a class="markdownIt-Anchor" href="#key-features"></a> Key Features</h2><ul><li>Record config --&gt;S3<ul><li>Record metadata;</li><li>attributes;</li><li>relationships; ( for example , an EBS is attached to an EC2)</li><li>current config;</li><li>related CloudTrail events</li></ul></li><li>Normalize</li><li>Store</li><li>Deliver: report , snapshot<ul><li>can aggregate to a single S3 from multi accounts across multi Region</li><li>Can make use of SNS -&gt;SQS to aggregate</li></ul></li></ul><h1 id="aws-config-rules"><a class="markdownIt-Anchor" href="#aws-config-rules"></a> AWS Config Rules</h1><p>make use of the AWS Config Deliver to set up rules to check config changes ; visualize compliance and identify offending changes</p><h2 id="trigger-by-change"><a class="markdownIt-Anchor" href="#trigger-by-change"></a> Trigger by change</h2><ul><li>Tag: for example anything with tag “Production” changed then the rule is triggerred</li><li>Resource type</li><li>Resource ID</li></ul><h2 id="trigger-by-frequency"><a class="markdownIt-Anchor" href="#trigger-by-frequency"></a> Trigger by frequency</h2><h1 id="use-cases"><a class="markdownIt-Anchor" href="#use-cases"></a> Use Cases</h1><ul><li><p>Security Analysis</p></li><li><p>Audit compliance</p></li><li><p>Change management</p></li><li><p>Trouble shooting</p></li><li><p>Discovery</p></li><li><p>The aws console will provide muti views. Rule centric and Resource centric.</p></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://youtu.be/sGUQFEZWkho" target="_blank" rel="noopener">https://youtu.be/sGUQFEZWkho</a></p><h1 id="differenciation"><a class="markdownIt-Anchor" href="#differenciation"></a> Differenciation</h1><p><a href="https://www.sumologic.com/blog/amazon-web-services/aws-config-vs-cloudtrail/" target="_blank" rel="noopener">https://www.sumologic.com/blog/amazon-web-services/aws-config-vs-cloudtrail/</a><br>Both tools are helpful when implementing a self-serve IT policy. Config works well with CloudFormation to enable IT to create approved templates for every type of AWS resource. These templates, when shared with developers, let them provision the resources they require without needing IT’s approval every time. It speeds up development, and importantly, enforces a consistent level of quality across the organization. If an employee changes the template while creating the resource, Config catches the change and notifies IT of the violation. If IT wants to dig deeper, they can use CloudTrail to help discover who made the change, from where, and when.</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Config </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - IAM</title>
      <link href="2018/06/29/markdown/AWS/AWS2018/03_IAMNinja/"/>
      <url>2018/06/29/markdown/AWS/AWS2018/03_IAMNinja/</url>
      
        <content type="html"><![CDATA[<h1 id="10-iam-best-practises"><a class="markdownIt-Anchor" href="#10-iam-best-practises"></a> 10 IAM best practises</h1><h1 id="identity-and-credential-management"><a class="markdownIt-Anchor" href="#identity-and-credential-management"></a> Identity and Credential Management</h1><ol><li>Create Individual Users</li><li>Configure a credential policy</li><li>Rotate the credentials Regularly</li><li>Enable MFA for privileged users (software or hardware)</li><li>Manage permissions by group</li><li>Grant least privilege</li><li>Use IAM roles to share access; (benefit, no password share and no need to long-term store credentials)</li></ol><ul><li>For example, PROD env account trust Dev env account; and then config a role to have access to database; Dev account then assign certain user to assume the role to manage the database.;</li></ul><ol start="8"><li>Use IAM roles for EC2 instance.(launch the EC2 with role)</li><li>Use CloudTrail to get logs of API calls</li><li>Reduce or remove the use of root credential</li></ol><ul><li>fine grained access control by resource tags</li></ul><h1 id="account-management"><a class="markdownIt-Anchor" href="#account-management"></a> Account management</h1><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote></blockquote><p><a href="https://youtu.be/SGntDzEn30s" target="_blank" rel="noopener">https://youtu.be/SGntDzEn30s</a></p><h1 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h1><p>The policy language,</p><ul><li>Specification: define access policies</li><li>Enforcement: evaluating policies</li></ul><h2 id="policy-specification"><a class="markdownIt-Anchor" href="#policy-specification"></a> Policy Specification</h2><h3 id="speicification-parc-model"><a class="markdownIt-Anchor" href="#speicification-parc-model"></a> Speicification: PARC Model</h3><ul><li>AWS Policy using PARC model,</li><li>P, Principal<ul><li>can be aws user, group or role, service, or federated user</li><li><a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html</a></li></ul></li><li>A, Action<ul><li>thousands of actions</li><li>Understand difference between <strong>NotAction</strong> and <strong>Deny</strong>. (Important).</li></ul></li><li>R, Resource<ul><li>arn representing aws services</li></ul></li><li>C, Condition<ul><li>multiple conditions will by default using OR</li></ul></li></ul><h3 id="policy-variables"><a class="markdownIt-Anchor" href="#policy-variables"></a> Policy Variables</h3><ul><li>Policy version is mandatory, if not include, all variables will be treated as string</li></ul><h2 id="policy-enforcement"><a class="markdownIt-Anchor" href="#policy-enforcement"></a> Policy Enforcement</h2><ul><li>Request raised, AWS will retrive all policies associated with user and resource</li><li>Filter retrieved policy using action and conditions</li><li>Evaluate all Deny policies firstly</li><li>Evaluate all Allow policies, if find true statement then Allow, if not then Deny.</li></ul><h3 id="demo1-limit-user-access-to-his-own-home-folder"><a class="markdownIt-Anchor" href="#demo1-limit-user-access-to-his-own-home-folder"></a> Demo1, limit user access to his own home folder</h3><h3 id="make-use-of-limited-iam-administrator"><a class="markdownIt-Anchor" href="#make-use-of-limited-iam-administrator"></a> make use of “limited” IAM administrator</h3><p>Demo1, A user which can create user but only attach certain list of policies</p><ul><li>Apply policy to access to IAM<ul><li>give user list user access ; give user full user access to self</li></ul></li><li>Apply policy to create user and attach policy ( use condition to limit the list of policies)</li></ul><p>Demo2, Demo Grant Conditional Cross-Account Access</p><ul><li>Define a policy in PROD account represent what kind of access and attach the policy to a role</li><li>Define another policy in PROD account to define which principal can consume the role</li><li>Define a policy in DEV account to certain user to consume the role</li><li>user can switch (like gmail switch user)</li></ul><h1 id="improvements"><a class="markdownIt-Anchor" href="#improvements"></a> Improvements</h1><h2 id="ec2-fine-grained-policies"><a class="markdownIt-Anchor" href="#ec2-fine-grained-policies"></a> EC2 fine grained policies</h2><ul><li>resource represent ec2 resource based on resource arn till the instance id</li><li>use tag in conditions</li></ul><p>Demo3, limit user from starting/stopping/terminating instance unless he owns that instance</p><ul><li>EC2 will have owner tag</li><li>Policy grants user access to EC2 console</li><li>Policy limit user access based on owner tag</li></ul><p>Demo4, Limit user from starting expensive instances</p><ul><li>Make use of “NotAction” and “NotResource” to make sure we don’t miss out necessary access to launch a instance</li><li>Define allow to certain action and certain resource and using condition to limit certain types</li></ul><p>Improvement: make use of IfExists</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">    <span class="attr">"Version"</span>: <span class="string">"2012-10-17"</span>,</span><br><span class="line">    <span class="attr">"Statement"</span>: [</span><br><span class="line">        &#123;</span><br><span class="line">            <span class="attr">"Effect"</span>: <span class="string">"Allow"</span>,</span><br><span class="line">            <span class="attr">"Action"</span>: <span class="string">"ec2:*"</span>,</span><br><span class="line">            <span class="attr">"Resource"</span>: <span class="string">"*"</span></span><br><span class="line">        &#125;,</span><br><span class="line">        &#123;</span><br><span class="line">            <span class="attr">"Effect"</span>: <span class="string">"Deny"</span>,</span><br><span class="line">            <span class="attr">"Action"</span>: [</span><br><span class="line">                <span class="string">"ec2:StartInstances"</span>,</span><br><span class="line">                <span class="string">"ec2:RunInstances"</span></span><br><span class="line">            ],</span><br><span class="line">            <span class="attr">"Resource"</span>: <span class="string">"arn:aws:ec2:*:12234567890:instance/*"</span>,</span><br><span class="line">            <span class="attr">"Condition"</span>: &#123;</span><br><span class="line">                <span class="attr">"StringNotLikeIfExists"</span>: &#123;</span><br><span class="line">                    <span class="attr">"ec2:InstanceType"</span>: [<span class="string">"t1.*"</span>, <span class="string">"t2.*"</span>, <span class="string">"m3.*"</span>]</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="policy-simulator"><a class="markdownIt-Anchor" href="#policy-simulator"></a> Policy simulator</h2><ul><li>Test your policy</li></ul><h2 id="decode-authorization-message-need-access"><a class="markdownIt-Anchor" href="#decode-authorization-message-need-access"></a> Decode authorization message (need access)</h2><ul><li>use cli to decode</li><li>Json Lint</li></ul><h1 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h1><p><a href="https://youtu.be/y7-fAT3z8Lo" target="_blank" rel="noopener">https://youtu.be/y7-fAT3z8Lo</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> IAM </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Maven - Refresh</title>
      <link href="2018/06/25/markdown/Java/Maven/Maven-KeyNotes/"/>
      <url>2018/06/25/markdown/Java/Maven/Maven-KeyNotes/</url>
      
        <content type="html"><![CDATA[<h1 id="key-terminology"><a class="markdownIt-Anchor" href="#key-terminology"></a> Key Terminology</h1><p><a href="https://maven.apache.org/guides/" target="_blank" rel="noopener">https://maven.apache.org/guides/</a></p><ul><li><p>LifeCycles</p></li><li><p>Project Inheritence vs Aggregation vs Mixed</p></li></ul><blockquote><p><a href="https://maven.apache.org/guides/introduction/introduction-to-the-pom.html" target="_blank" rel="noopener">https://maven.apache.org/guides/introduction/introduction-to-the-pom.html</a></p></blockquote><ul><li>Build profile<ul><li>per Project</li><li>per user</li><li>Global</li><li>Profile Descriptor (dynamically loaded in project)</li></ul></li></ul><blockquote><p><a href="https://maven.apache.org/guides/introduction/introduction-to-profiles.html" target="_blank" rel="noopener">https://maven.apache.org/guides/introduction/introduction-to-profiles.html</a></p></blockquote><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"># Maven Command</span><br><span class="line"></span><br><span class="line"># mvn groupId:artifactId:goal -Denvironment=test -P profile1,!profile2</span><br><span class="line"></span><br><span class="line">mvn com.mycom.app1:module1:deploy -Denvironment=test -P profile1, !profile2</span><br><span class="line">mvn <span class="built_in">help</span>:active-profiles -Denv=dev</span><br><span class="line">mvn <span class="built_in">help</span>:effective-pom -P appserverConfig-dev</span><br></pre></td></tr></table></figure><h1 id="maven-with-docker"><a class="markdownIt-Anchor" href="#maven-with-docker"></a> Maven with Docker</h1><p>Hands on to build a maven java project as a docker image<br><a href="https://examples.javacodegeeks.com/devops/docker/introduction-docker-java-developers/" target="_blank" rel="noopener">https://examples.javacodegeeks.com/devops/docker/introduction-docker-java-developers/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> DevOps </tag>
            
            <tag> Maven </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Tibco -- BusinessWorks</title>
      <link href="2018/06/22/markdown/TechByVendorName/Tibco/BusinessWorks/"/>
      <url>2018/06/22/markdown/TechByVendorName/Tibco/BusinessWorks/</url>
      
        <content type="html"><![CDATA[<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash">Start the IDE</span></span><br><span class="line">~/tibco/bwce/studio/4.0/eclipse/TIBCOBusinessStudio.app/Contents/MacOS/TIBCOBusinessStudio</span><br><span class="line"></span><br><span class="line">rm -rf ~/Downloads/tempworkspace</span><br><span class="line">mkdir ~/Downloads/tempworkspace</span><br><span class="line"></span><br><span class="line">cd ~/tibco/bwce/bwce/2.3/bin</span><br><span class="line">./bwdesign -data ~/Downloads/tempworkspace system:import -f /Users/ruiliu/Downloads/temp/workspacetest/tib_bw_ci_poc</span><br><span class="line">./bwdesign -data ~/Downloads/tempworkspace system:export hello.world.application ~/Downloads</span><br><span class="line"></span><br><span class="line">ls ~/Downloads</span><br></pre></td></tr></table></figure><h1 id="quick-start"><a class="markdownIt-Anchor" href="#quick-start"></a> Quick Start</h1><p><a href="https://aws-quickstart.s3.amazonaws.com/quickstart-tibco-bwce/doc/tibco-bwce-on-the-aws-cloud.pdf" target="_blank" rel="noopener">https://aws-quickstart.s3.amazonaws.com/quickstart-tibco-bwce/doc/tibco-bwce-on-the-aws-cloud.pdf</a></p><ul><li>This pricing model enables you to pay only for the number of containers running per hour and gives you flexibility to<br>scale on demand and manage software costs as you scale.</li></ul><p>Software Pricing Details<br>TIBCO BusinessWorks™ Container Edition and Plug-ins for AWS<br>$0.4 /Host/hr<br>Infrastructure Pricing Details<br>Estimated Infrastructure Cost<br>$5/month<br>The table shows current software and infrastructure pricing for services hosted in Asia Pacific (Singapore). Additional taxes or fees may apply.</p><p>TIBCO BusinessWorks™ Container Edition and Plug-ins for AWS<br>Unit Type/Host/hr<br>Unit = 1 TIBCO BusinessWorks Consumption Unit$0.2<br>1 TIBCO BWCE App Container = 5 TIBCO BusinessWorks Consumption Units$1<br>1 TIBCO BW Plug-in = 2 TIBCO BusinessWorks Consumption Units$0.4</p><p>------ ECS<br>onfiguration and 24x7 usage. Different CloudFormation configurations may result in different infrastructure costs</p><p>EC22 x m4.large machine or equivalent<br>ELB20 GB Per Month<br>NAT Gateway2 Connections using 10 GB Per Month<br>EBS60 GB General Purpose SSD</p><p>-----ami<br>Estimated Infrastructure Cost<br>$0.125 EC2/hr<br>running on m4.large<br>-----Docker</p><p>Estimated Infrastructure Cost<br>$5/month<br>Estimated infrastructure costs are based on the following default deployment configuration and 24x7 usage. Different CloudFormation configurations may result in different infrastructure costs</p><p>EC21 x t2.medium machine or equivalent<br>S36 GB Per Month<br>EBS30 GB General Purpose SSD</p>]]></content>
      
      
      
        <tags>
            
            <tag> Tibco </tag>
            
            <tag> BusinessWorks </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Glacier</title>
      <link href="2018/06/17/markdown/AWS/AWS2018/Extra_Glacier/"/>
      <url>2018/06/17/markdown/AWS/AWS2018/Extra_Glacier/</url>
      
        <content type="html"><![CDATA[<h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><p><strong>Vault</strong> : Container for Archives. 1k Vualts per account<br><strong>Archives</strong>: Basic Unit of backup. 40TB Max per archieve. No limit on number of archives.<br><strong>Inventory</strong>: Cold <strong>index</strong> of archives (refresh every 24 hours)</p><h1 id="access-glacier"><a class="markdownIt-Anchor" href="#access-glacier"></a> Access Glacier</h1><ol><li>SDK / API</li><li>S3 Lifecycle (Bucket Level or Object Level)</li></ol><ul><li>new feature: Archive S3 object by tag</li></ul><ol start="3"><li>3rd party tools &amp; Gateways</li></ol><h1 id="upload-to-glacier"><a class="markdownIt-Anchor" href="#upload-to-glacier"></a> Upload to glacier</h1><ul><li>Make use of description to persist metadata (in case local index is corrupted)</li><li>Aggregate data into MBs , small data will have loads of overhead when persist into glacier</li><li>Consider to persist file checksum with index locally ;</li><li>Consider to persist file offset when files are aggregated, this helps to retrive data using range head</li><li>Use multi-part upload</li></ul><h1 id="data-management"><a class="markdownIt-Anchor" href="#data-management"></a> Data Management</h1><ul><li>Vault Tag<ul><li>View billing by tag; config security by tag</li></ul></li><li>Integrate with CloudTrail</li><li>Vault access policies: easy to control access and share content with other account</li><li><strong>Vault Lock</strong> : 24hours cooling down / test period</li><li><strong>Vault Access Policy</strong> : give more flexibility compare to Vault lock. For example, make use of the <strong>Legal Hold</strong> tag on the vault</li></ul><h1 id="retrievals"><a class="markdownIt-Anchor" href="#retrievals"></a> Retrievals</h1><p>Steps to retrieve data</p><ul><li>Step 1, request a job (which vault , which archive id, what range)</li><li>Step 2, processing (depending type of request, can be minutes or hours)</li><li>Step 3, Job completion notification</li><li>Step 4, Download the data</li></ul><p>Restore via S3 Lifecycle</p><ul><li>Request a file restore</li></ul><p>3 Options:  Expedited --&gt; Standard --&gt; Bulk (cheapest, 12 hours for peta bytes)</p><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>Deep Dive Glacier<br><a href="https://youtu.be/l8ug62pVbtw" target="_blank" rel="noopener">https://youtu.be/l8ug62pVbtw</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Glacier </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Athena</title>
      <link href="2018/06/17/markdown/AWS/AWS2018/Extra_Athena/"/>
      <url>2018/06/17/markdown/AWS/AWS2018/Extra_Athena/</url>
      
        <content type="html"><![CDATA[<h1 id="athena-query-on-s3"><a class="markdownIt-Anchor" href="#athena-query-on-s3"></a> Athena Query on S3</h1><ul><li>No data loading</li><li>Serverless<ul><li>support multi format (data lake)</li></ul></li><li>$5 for 5T data scanned from S3</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Athena </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Elastic Transcoder</title>
      <link href="2018/06/13/markdown/AWS/AWS2018/Extra_AWS_ElasticTranscoder/"/>
      <url>2018/06/13/markdown/AWS/AWS2018/Extra_AWS_ElasticTranscoder/</url>
      
        <content type="html"><![CDATA[<blockquote><p><a href="https://youtu.be/x20Qx7lWSLQ" target="_blank" rel="noopener">https://youtu.be/x20Qx7lWSLQ</a></p></blockquote><p>demo</p><ul><li>Create a new pipeline<ul><li>source bucket (name, storage level,access)</li><li>target bucket (name, storage level,access)</li><li>thumbnails bucket(name, storage level,access)</li></ul></li><li>Create a new job<ul><li>select pipeline</li><li>for one input can define multi output</li><li>define playlist</li></ul></li></ul><p>Free tier:<br>20 standard definition<br>10 HD definition</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Elastic Transcoder </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - EBS</title>
      <link href="2018/06/13/markdown/AWS/AWS2018/Extra_EBS/"/>
      <url>2018/06/13/markdown/AWS/AWS2018/Extra_EBS/</url>
      
        <content type="html"><![CDATA[<h1 id="ebs-definition"><a class="markdownIt-Anchor" href="#ebs-definition"></a> EBS definition</h1><ul><li>Network Block Storage</li><li>5 9s availability</li><li>Attached to EC2 in same AZ</li><li>point-in-time backup to S3</li></ul><h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><ul><li>iops : transaction /sec</li><li>Throughput: read/write / sec</li><li>Latency: delay between request response</li><li>Capacity: Volumn size</li><li>Block size: size of each i/o (kb)</li></ul><p>![GP2 Bursting Diagram][<a href="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/extra_EBS_gp2.png?raw=true" target="_blank" rel="noopener">https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/extra_EBS_gp2.png?raw=true</a>]</p><ul><li>Scenario: boost time speed up for windows when using gp2 because of Bursting<ul><li>which is important autosclaing</li></ul></li><li>Database ( you can use 2 gp2 volumn to archive 2*3k iops)</li></ul><h1 id="history"><a class="markdownIt-Anchor" href="#history"></a> History</h1><p>With EC2 -&gt; Magnet Storage -&gt; SSD Storage -&gt; gp2 SSD Storage -&gt; Volumn Encryption -&gt; Larger, faster -&gt; Boot volumn encryption -&gt; st1 and sc1 (HDD) -&gt; <strong>EBS Elastic Volumns</strong> (2017)</p><h2 id="iops-vs-throughput"><a class="markdownIt-Anchor" href="#iops-vs-throughput"></a> IOPS vs Throughput</h2><p>Select the correct type<br>IOPS: gp2 io1<br>Throughput: st1 sc1</p><h2 id="ebs-elastic-volumns"><a class="markdownIt-Anchor" href="#ebs-elastic-volumns"></a> EBS Elastic Volumns</h2><p>Change:</p><ul><li>You can increase volumn size (can’t decrease)</li><li>You can change volumn type</li><li>You can increase or decrease IOPS</li><li>You can combine above change together</li></ul><p>Limitation:</p><ul><li>change type must be valid , like size , st1 min size =500G, so we can’t change to st1 type with a size &lt;500G</li></ul><table><thead><tr><th></th><th>Type</th><th>MinSize</th><th>MaxSize</th><th>Min IOPS</th><th>Max IOPS</th><th>Min Throughput</th><th>Max Throughput</th></tr></thead><tbody><tr><td>io1</td><td>SSD</td><td>1GiB</td><td>16TiB</td><td>100</td><td>3K</td><td></td><td></td></tr><tr><td>gp2</td><td>SSD</td><td>4GiB</td><td>16TiB</td><td>100</td><td>20K (50:1)</td><td></td><td></td></tr><tr><td>sc1</td><td>Magnet</td><td>500GiB</td><td>16TiB</td><td>N/A</td><td>N/A</td><td>12</td><td>192</td></tr><tr><td>st1</td><td>Magnet</td><td>500GiB</td><td>16TiB</td><td>N/A</td><td>N/A</td><td>40</td><td>500</td></tr></tbody></table><h3 id="how-to-modify-elastic-volumns"><a class="markdownIt-Anchor" href="#how-to-modify-elastic-volumns"></a> How to modify elastic Volumns</h3><ul><li>Cli</li></ul><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">aws ec2 modify-volumn</span><br><span class="line">[--dry-run | --no-dry-run]</span><br><span class="line">--volumn-id &lt;value&gt;</span><br><span class="line">[--size &lt;value&gt;]</span><br><span class="line">[--volumn-type &lt;value&gt;]</span><br><span class="line">[--iops &lt;value&gt;]</span><br></pre></td></tr></table></figure><ul><li>other options: SDK, web console</li></ul><h3 id="benefits"><a class="markdownIt-Anchor" href="#benefits"></a> Benefits</h3><ul><li>No performance impact</li><li>No Downtime</li><li>No over-provisioning</li></ul><h3 id="steps"><a class="markdownIt-Anchor" href="#steps"></a> Steps</h3><p>Step1 modify<br>Step2 Monitor</p><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">aws ec2 describe-volumes-modifications</span><br><span class="line">[--volumn-ids &lt;value&gt; ]</span><br><span class="line">[--filters &lt;value&gt;]</span><br><span class="line">[--next-token &lt;value&gt;]</span><br><span class="line">[--max-results &lt;value&gt;]</span><br></pre></td></tr></table></figure><p>The status will show<br>“Modifying”–&gt; “optimizing”–&gt;“Completed”</p><p>Step3 Extend (if size is increased)</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">lsblk</span><br><span class="line">df -h</span><br><span class="line"><span class="meta">#</span><span class="bash"> ext2, ext3, ext4</span></span><br><span class="line">sudo resize2fs device_name</span><br><span class="line"><span class="meta">#</span><span class="bash"> xfs</span></span><br><span class="line">sudo xfs_growfs -d mount_point</span><br></pre></td></tr></table></figure><p>On widows , run diskmgmt.msc and choose to extend volumns .</p><p><strong>Limitation</strong>:</p><ul><li>max 1 change every 6 hours</li><li>once modification request raise, you can’t stop it.</li><li>you can’t change the encryption mode for existing volumn</li></ul><h3 id="bill"><a class="markdownIt-Anchor" href="#bill"></a> Bill</h3><p>You are charged at the moment you trigger the request with the target volumn.</p><h3 id="best-practise"><a class="markdownIt-Anchor" href="#best-practise"></a> Best practise</h3><ul><li><p>Security: lock down the modify-volumn api , treat it as the same as delete volumn.</p></li><li><p>Test Test Test before make any change.</p></li><li><p>Use GP2 as boot volumn</p></li><li><p>Tag your EBS snapshot (always)</p></li><li><p>Your EC2 CPU / Memory should match provisioned EBS</p></li><li><p>SSD EBS read / write didn’t different in performance</p></li><li><p>use Cloudwatch to  tuning EBS</p></li><li><p>Raid 0 (don’t use Raid 5 , or any raid with redundency)</p></li><li><p>Pre-warm : DD write for linux and  NTFS Format; use big block to do pre-warm (like 1M)</p></li><li><p>try to use ext4 or XFS; alignment = 4k (???)</p></li><li><p>Refer to the cheetsheet for common use case</p></li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>Dynamic EBS (2016)<br><a href="https://youtu.be/9vhR41YTp7E" target="_blank" rel="noopener">https://youtu.be/9vhR41YTp7E</a></p></blockquote><blockquote><p>aws elastic volumns<br><a href="https://github.com/aws-samples/aws-elastic-volumes" target="_blank" rel="noopener">https://github.com/aws-samples/aws-elastic-volumes</a></p></blockquote><blockquote><p>EBS Deep Dive (2015)<br><a href="https://youtu.be/gUYa7RzrNhM" target="_blank" rel="noopener">https://youtu.be/gUYa7RzrNhM</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS EBS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - KMS</title>
      <link href="2018/06/09/markdown/AWS/AWS2018/Extra_KMS/"/>
      <url>2018/06/09/markdown/AWS/AWS2018/Extra_KMS/</url>
      
        <content type="html"><![CDATA[<h1 id="background"><a class="markdownIt-Anchor" href="#background"></a> Background</h1><p>Application Security Design goals</p><p><strong>CIA</strong> (Confidentiality, Integrity, Availability)</p><ul><li><strong>Confidentiality</strong>:<ul><li>AWS is using the <strong>PARC</strong> Model<ul><li>Principal, Action, Resource, Condition</li></ul></li></ul></li><li><strong>Integrity</strong></li><li><strong>Availability</strong>: how long to encrypt/decrept the data, and how long the customer can stand if any of the system is not available and needs failover<ul><li>For example, how long it take to encrypt the data and write to S3</li></ul></li></ul><h1 id="key-implement-to-meet-cia-requirements"><a class="markdownIt-Anchor" href="#key-implement-to-meet-cia-requirements"></a> Key Implement to meet CIA requirements</h1><ol><li>“Don’t store secret as plaintext on disk” and “Decrypt only happens in your instance”</li></ol><ul><li>means encrypt and decrypt only happens inside your code inside your instance. (not aws service side)</li><li>User AWS KMS client SDK; S3 encryption client ; DynamoDB encryption client</li><li><strong>Envolop Encryption</strong> : use random key to encrypt each piece of data, encrypted data and corresponding key stored together, the key will be encrypted using master key before being stored.</li></ul><ol start="2"><li>“keep cipher text of secret in multiple locations”</li></ol><ul><li>make use of S3 --&gt; 11 9s durability or DynamoDB (if you consider latency)</li></ul><ol start="3"><li>“Make sure secrets not being changed since last used”</li><li>&quot;if instance can launch, secret should be accessible; &lt;1 min to provision plaintext secret to instance &quot;</li></ol><ul><li><p>KMS exist in every Region (except China 😦 ) ;</p></li><li><p>Make careful decisions between retriving each time or caching in memory</p></li><li><p>Key policy is the king ! Key policy not equals with IAM policy.</p></li></ul><h1 id="case-study-okta"><a class="markdownIt-Anchor" href="#case-study-okta"></a> Case Study : Okta</h1><p>Okta is a unit of measure for cloud cover. From 0 to 8 describe how much visability it is.</p><ul><li>Simple Best Priatice to<ul><li>Data from Database needs to be encrypted at rest or in memory</li><li>Encrypt Key only in Memory</li><li>Service has access to plain text data</li></ul></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/Extra_KMS_Okta_mode.png?raw=true" alt="Okta Encrypto Mode"></p><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/EgJ9EQn0EJ8" target="_blank" rel="noopener">https://youtu.be/EgJ9EQn0EJ8</a></p></blockquote><blockquote><p><a href="https://youtu.be/EgJ9EQn0EJ8" target="_blank" rel="noopener">https://youtu.be/EgJ9EQn0EJ8</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS KMS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - WAF and Shield</title>
      <link href="2018/06/01/markdown/AWS/AWS2018/Extra_WAF_Shield/"/>
      <url>2018/06/01/markdown/AWS/AWS2018/Extra_WAF_Shield/</url>
      
        <content type="html"><![CDATA[<h1 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h1><table><thead><tr><th>Threats</th><th>DDoS</th><th>Application Attacks</th><th>Bad Bots</th></tr></thead><tbody><tr><td>Application Layer(7)</td><td>HttpFloods &lt;&lt;-- Shield Advanced</td><td>SQL Injection; Sensitive Data Explosure; Social Engineering; Application exploits  &lt;&lt;-- WAF</td><td>Crawler; Content Scraper; Scanner&amp;Probe &lt;&lt;–WAF</td></tr><tr><td>Network Layer (3&amp;4)</td><td>Reflection; SSLAbuse; Amplification; SlowLoris; Layer4Floods &lt;&lt;-- Shield Standard</td><td></td><td></td></tr></tbody></table><h1 id="ddos"><a class="markdownIt-Anchor" href="#ddos"></a> DDOS</h1><ul><li>Layer 3/4 DDoS<ul><li>SYN/UDP Floods: A SYN flood attack works by not responding to the server with the expected ACK code， cause the server waiting for ACK for ever(timeout) and exhaust server resources.</li><li>reflection Attacks: trusted entities use shared mutual key, when faker is challenged to response using mutual key, he will send the orginal challenge to any server using same mutual key as a challenge, and get the response to respond to original request.</li></ul></li><li>Layer 7 DDoS</li></ul><h1 id="key-features"><a class="markdownIt-Anchor" href="#key-features"></a> Key Features</h1><h2 id="aws-shield"><a class="markdownIt-Anchor" href="#aws-shield"></a> AWS Shield</h2><ul><li>Standard : layer 3/4 protection<ul><li>Always on : heuristics-based anomal dectection; baseling</li></ul></li><li>Advanced : layer 7 protection<ul><li>with AWS Shield, WAF is free</li><li>DDoS Scaling up free (report and refund)</li><li>Available when you have App ELB, Classic ELB, CloudFront, S3 and Rout53</li><li>Integrate with Cloudwatch to have metrics and report about the attack</li><li>Billing : multi accounts shared in one enterprise can share the service once enterprise bought this services</li></ul></li></ul><h2 id="aws-waf"><a class="markdownIt-Anchor" href="#aws-waf"></a> AWS WAF</h2><p>Feature Summary,</p><ul><li><p>Filter traffic based on customized rules</p></li><li><p>Malicious Request protection</p><ul><li>SQL injection</li><li>Process encrpting (???)</li></ul></li><li><p>Active monitoring and tuning</p></li><li><p>Less than 55 sec before the new rule is applied globally</p></li><li><p>Less than 1 minisec inspection time when turned on</p></li><li><p>API &amp; SDK support when define the rules</p></li><li><p>Pre-configured rules</p></li></ul><p>How to use,</p><ul><li>Flexible customized Rules</li><li>Pre-configured rules</li><li>Security Automations (combines with Lambda;)</li></ul><p>Common use case for WAF,</p><ul><li>IP Reputation List<ul><li>Can deploy this feature using cloudformation</li><li>update reputation list from 3 trustful sources</li></ul></li><li>HTTP floods<ul><li>limit number of http requests per client in a 5 min bucket</li></ul></li><li>Scanners &amp; probes<ul><li>available to deploy using cloudformation</li><li>Honney pot url (???)</li></ul></li></ul><h3 id="demo-1-whitelist-good-user"><a class="markdownIt-Anchor" href="#demo-1-whitelist-good-user"></a> DEMO-1 WhiteList good user</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/Extra_WAF_WAFConfig.png?raw=true" alt="WAF config"><br>Define conditions --&gt; attach condition to rules --&gt; attach rules to WEB ACLs, associate WEB ACLs to AWS services (S3, CloudFront, ELB)</p><h3 id="demo-2-virtual-patching"><a class="markdownIt-Anchor" href="#demo-2-virtual-patching"></a> DEMO-2 Virtual Patching</h3><p>Example : Apache Struts Vulnerability<br>When a condition is attached to Rule, you can define whether block or allow when the condition is true</p><p>Rate-Based Rule + URI String Match Condition = protect Brute Force Login Attemps</p><h3 id="demo-3-brute-force-on-login"><a class="markdownIt-Anchor" href="#demo-3-brute-force-on-login"></a> DEMO-3 Brute Force on Login</h3><p>When define a rule , there are 2 options,  “Regular rule” or “Rate Based Rule”. For this scenario, we use “Rate Based Rule”</p><p>Define “Rate Based Rule” with the “/login” URI match condition, set 2000 times / 5 min</p><h1 id="owasp-top-10"><a class="markdownIt-Anchor" href="#owasp-top-10"></a> OWASP Top 10</h1><ul><li><p>A1 : Injection</p></li><li><p>A2 : Broken Authentication and Session Management</p><ul><li>Hard to distinguish legistimate Users</li><li>Automate update of black list of token When<ul><li>different location with same token</li><li>abnormal login rate</li></ul></li></ul></li><li><p>A3 : Cross Site Scripting (XSS)</p><ul><li>for example, a blog platform has a user published a blog with embeded script loading from his own website to run in browser (who ever browse this perticular blog) and exploit the key inputs</li><li>It’s easy to block content with Script tag from Body, querystring or cookie; but needs to be carefully thinking about other requirement like SVG graphics (using script tag)</li></ul></li><li><p>A4 : Broken Access Control</p><ul><li><a href="http://mywebsite/editprofile?userid=1234" target="_blank" rel="noopener">http://mywebsite/editprofile?userid=1234</a> ; once authenticated, user 1234 can access <a href="http://mywebsite/editeprofile?userid=4567" target="_blank" rel="noopener">http://mywebsite/editeprofile?userid=4567</a><ul><li><strong>Mitigate</strong>: Hard, possibly match signature</li></ul></li><li><a href="http://mywebsite/download?file=file1.pdf" target="_blank" rel="noopener">http://mywebsite/download?file=file1.pdf</a> ; once authenticated, user can manipulate the file path and expose any file on Server (<a href="http://mywebsite/download?file=../../../../etc/passwd" target="_blank" rel="noopener">http://mywebsite/download?file=../../../../etc/passwd</a>)<ul><li>Directory Traversal</li><li>Local file Inclusion</li><li><strong>Mitigate</strong>:can use WAF to match against …/,😕/</li></ul></li><li><a href="http://mywebsite/?module=myprofile&amp;action=edit" target="_blank" rel="noopener">http://mywebsite/?module=myprofile&amp;action=edit</a> ; once authenticated, user try to visit <a href="http://mywebsite/?module=management&amp;action=edit" target="_blank" rel="noopener">http://mywebsite/?module=management&amp;action=edit</a><ul><li><strong>Mitigate</strong>: limit source if possible;</li><li><strong>Mitigate</strong>: User lambda@edge (???)</li></ul></li></ul></li><li><p>A5 : Security Misconfiguration</p><ul><li>Leave web server <strong>ServerTokens Full</strong> (default config) which expose exact version and components for attackers to use known Vulnerabilities</li><li>Leave default directory listing enabled</li><li>Return stack trace in error page</li><li>PHP bug to allow request parameter registered as global variable; attackers use this to overwrite global variable <a href="http://mywebsite/?_SERVER%5BDOCUMENT_ROOT%5D=http://attackerswebsite/bad.htm" target="_blank" rel="noopener">http://mywebsite/?_SERVER[DOCUMENT_ROOT]=http://attackerswebsite/bad.htm</a> ; this will change doc root to another website.<ul><li><strong>Mitigate</strong>: block query string with _SERVER</li></ul></li><li><strong>Mitigate</strong>: use <strong>Amazon Inspector</strong> check against common known mis-configurations</li><li><strong>Mitigate</strong>: User <strong>AWS Config</strong> and <strong>EC2 System Manager</strong> to track configuration changes over time.</li></ul></li><li><p>A6 : Sensitive Data Explosure</p><ul><li>SHA-1 hashing algorithm; attackers can attempt to cause <strong>hash collision</strong></li><li><strong>Mitigate</strong>: Both ELB and Cloudfront support specify allowed ciphers</li></ul></li><li><p>A7 : Insufficient Attack Protection</p><ul><li>Submit abnormal huge number of requests or single request with huge payload</li><li><strong>Mitigate</strong>: Rate based rules &amp; size constraint Rules</li><li><strong>Mitigate</strong>: <strong>WAF Security Automation</strong> with Lambda<ul><li>Lambda analysis access log to update block ip</li><li>update block ip list from reputation list</li><li>Honeypot URL</li></ul></li></ul></li><li><p>A8 : Cross Site Request Forgery (CSRF)</p><ul><li>Different with XSS. This is relying on user’s trust to browser</li><li>embed this link <img src="http://www.examplebank.com/withdraw?account=Alice&amount=1000&for=Badman">; user who click it &amp; just logged in online banking will transfer $1000 to Badman</li><li><strong>Mitigate</strong>: embeded hidden token(GUID) in form or header.</li><li><strong>Mitigate</strong>:  check refer header is from correct source ( won’t work if the browser implementation is changed)</li></ul></li><li><p>A9 : Using components with known vulnarables</p><ul><li>CVE: common Vulnerabilities and Exposures</li><li><strong>Mitigate</strong>:Filter out components not being used in your application</li><li><strong>Mitigate</strong>: <strong>Penetrating Test</strong> ; needs aws permission</li></ul></li><li><p>A10 : Unprotected APIs</p><ul><li>same with A1-A9 but with API</li></ul></li><li><p>Old A10 : Unvalidated Re-directs and Forwards</p><ul><li>Url shorten solution and user generated a shoren url using original one like : <a href="http://mysite/link?target=http://badsite" target="_blank" rel="noopener">http://mysite/link?target=http://badsite</a></li><li><strong>Mitigate</strong>: white list to target url or uri</li></ul></li></ul><h2 id="owasp-top-10-cloudformation-templates"><a class="markdownIt-Anchor" href="#owasp-top-10-cloudformation-templates"></a> OWASP TOP 10 Cloudformation Templates</h2><blockquote><p><a href="https://github.com/aws-samples/aws-waf-sample/blob/master/waf-owasp-top-10/owasp_10_base.yml" target="_blank" rel="noopener">https://github.com/aws-samples/aws-waf-sample/blob/master/waf-owasp-top-10/owasp_10_base.yml</a></p></blockquote><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/W01f7g7slHw" target="_blank" rel="noopener">https://youtu.be/W01f7g7slHw</a></p></blockquote><p>Use WAF to mitigate OWASP TOP 10 Coverage<br><a href="https://d0.awsstatic.com/whitepapers/Security/aws-waf-owasp.pdf" target="_blank" rel="noopener">https://d0.awsstatic.com/whitepapers/Security/aws-waf-owasp.pdf</a></p><p>SHA 1  and hash collision</p><blockquote><p><a href="https://zh.wikipedia.org/wiki/SHA-1" target="_blank" rel="noopener">https://zh.wikipedia.org/wiki/SHA-1</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS WAF </tag>
            
            <tag> AWS Shield </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>This is My architecture</title>
      <link href="2018/05/31/markdown/AWS/AWS2018/ThisIsMyArchitecture/"/>
      <url>2018/05/31/markdown/AWS/AWS2018/ThisIsMyArchitecture/</url>
      
        <content type="html"><![CDATA[<p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/MyArchi_VMWareSDDC.png?raw=true" alt="VMWare SDDC"></p><p>VMware Software Defined Data Center (VMware SDDC)</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> architecture sample </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Big Data Analytics Options on AWS</title>
      <link href="2018/05/19/markdown/AWS/AWS2018/Selection_BigDataAnalyticsOptions_AWS/"/>
      <url>2018/05/19/markdown/AWS/AWS2018/Selection_BigDataAnalyticsOptions_AWS/</url>
      
        <content type="html"><![CDATA[<h1 id="aws-key-advantages-cloud-advantages"><a class="markdownIt-Anchor" href="#aws-key-advantages-cloud-advantages"></a> AWS key advantages (Cloud advantages)</h1><ul><li>Fexibility with Failover accross Availability Zones<ul><li>Different data collection options &amp; technique :<ul><li>AWS Data Pipeline , AWS Import/Export Snowball , AWS Mobile Hub, AWS IOT, Kinesis Firehose</li></ul></li></ul></li></ul><h1 id="comparison"><a class="markdownIt-Anchor" href="#comparison"></a> Comparison</h1><h2 id="service-supported-data-volumn-and-antipatterns"><a class="markdownIt-Anchor" href="#service-supported-data-volumn-and-antipatterns"></a> Service Supported Data Volumn and antipatterns</h2><table><thead><tr><th>Service</th><th>Data Volumn</th><th>AntiPattern</th></tr></thead><tbody><tr><td>Kinesis</td><td>Terabytes</td><td>Steaming throughput &lt;200k/s</td></tr><tr><td>AWS EMR</td><td>N/A</td><td>Small data, ACID transaction(RDB)</td></tr><tr><td>AWS ML</td><td>Max 100 G</td><td>&gt;100G Dataset(EMR); Unsupported ML Tasks (EMR)</td></tr><tr><td>DynamoDB</td><td>N/A</td><td>Join/Complex Transation (RDB), BLOB (S3), Low IO data (S3)</td></tr><tr><td>AWS Redshift</td><td>min 160G - Petabyte</td><td>unstructured Data (DynamoDB), OLTP (RDB), BLOB(S3)</td></tr><tr><td>AWS Elasticsearch Service</td><td>5T</td><td>&gt;5T(EMR, EC2), OLTP(RDB)</td></tr></tbody></table><h2 id="service-cost-model"><a class="markdownIt-Anchor" href="#service-cost-model"></a> Service Cost model</h2><table><thead><tr><th>Service</th><th>Cost Model</th></tr></thead><tbody><tr><td>Kinesis</td><td>Per Hour Per Shard + 1 million Put Transactions</td></tr><tr><td>AWS EMR</td><td>hourly charge to EMR + EC2 hosting EMR</td></tr><tr><td>AWS ML</td><td>hourly rate to set up model + number of predictions genearted. (Realtime prediction will also charge on memory reserved to run the model)</td></tr><tr><td>DynamoDB</td><td>throughput/hour + data storage/month + data transfer in&amp;out in GB/ month</td></tr><tr><td>AWS Redshift</td><td>node size and number</td></tr><tr><td>AWS Elasticsearch Service</td><td>hourly rate + storage + data transfer</td></tr></tbody></table>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Big Data </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Data Migration using SQL Developer Tool (Oracle)</title>
      <link href="2018/05/14/markdown/TechByVendorName/OracleDB/SQLDeveloper_Tips/"/>
      <url>2018/05/14/markdown/TechByVendorName/OracleDB/SQLDeveloper_Tips/</url>
      
        <content type="html"><![CDATA[<h1 id="senario"><a class="markdownIt-Anchor" href="#senario"></a> Senario</h1><p>Move &gt;10k records from MS SQL Database A to B (Different Table Design)</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">spool "C:\temp\data.csv"</span><br><span class="line"></span><br><span class="line"><span class="keyword">select</span> <span class="comment">/*csv*/</span></span><br><span class="line">    <span class="keyword">name</span> <span class="keyword">as</span> userName,</span><br><span class="line">    address <span class="keyword">as</span> userAddress</span><br><span class="line"><span class="keyword">from</span> customers;</span><br><span class="line"></span><br><span class="line">spool off;</span><br></pre></td></tr></table></figure><p>The records will be written into csv for furthur processing</p><ul><li>Suitable for temp solution or inital load</li><li>Data exported with “” , <strong>which I haven’t found a way to get rid of</strong>; I am using Java to strip off the “” before the next processing steps</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> SQL Developer </tag>
            
            <tag> Data Migration </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Security best practise</title>
      <link href="2018/05/13/markdown/AWS/AWS2018/BestPractices_AWS_Security/"/>
      <url>2018/05/13/markdown/AWS/AWS2018/BestPractices_AWS_Security/</url>
      
        <content type="html"><![CDATA[<h1 id="shared-responsibility-model"><a class="markdownIt-Anchor" href="#shared-responsibility-model"></a> Shared Responsibility Model</h1><h2 id="不同的类型服务有不同的shared-responsibility-model"><a class="markdownIt-Anchor" href="#不同的类型服务有不同的shared-responsibility-model"></a> 不同的类型服务有不同的shared responsibility model</h2><ul><li>Infrastructure Service<ul><li>including EC2, EBS, VPC</li></ul></li><li>Container Service<ul><li>including RDS, EMR, AWS Elastic Beanstalk<ul><li>跟Infrastructure service相比，aws会负责OS以及OS上部署的应用平台</li></ul></li></ul></li><li>Abstracted Service<ul><li>Serveless<ul><li>跟上一种相比，aws还负责服务端和网络的加密；用户基本只负责客户端</li></ul></li></ul></li></ul><h2 id="推荐使用trusted-advisor"><a class="markdownIt-Anchor" href="#推荐使用trusted-advisor"></a> 推荐使用“Trusted Advisor”</h2><ul><li>除非要求买专业服务，这个安全报告是免费的。</li><li>检查内容包含常见端口检查</li></ul><h1 id="设计aws安全的步骤方法论"><a class="markdownIt-Anchor" href="#设计aws安全的步骤方法论"></a> 设计AWS安全的步骤方法论</h1><h2 id="第一步-define-and-categorize-assets-on-aws"><a class="markdownIt-Anchor" href="#第一步-define-and-categorize-assets-on-aws"></a> 第一步： Define and categorize Assets on AWS</h2><p>最佳时间，列表：</p><table><thead><tr><th>Asset名字</th><th>Owner</th><th>Category</th><th>Dependency</th><th>Cost</th></tr></thead><tbody><tr><td>系统名称比如LDAP</td><td>谁在使用</td><td>重要性，基础业务应用还是用来支持基础业务应用的网络和软件</td><td>用了哪些aws的服务</td><td>谁出钱</td></tr></tbody></table><h2 id="第二步-design-isms-information-security-management-system"><a class="markdownIt-Anchor" href="#第二步-design-isms-information-security-management-system"></a> 第二步： Design ISMS (Information Security Management System)</h2><ul><li>定义Scope</li><li>确定policy ：目标；遵循哪些法规；如何衡量风险；如何审批安全计划</li><li>确定风险评估方法： 业务需求；信息安全需要；信息技术支持；法律要求和责任</li><li>识别，分析评估和处理Risk</li><li>Apply安全控制framework</li><li>ISMS计划通过管理层审批</li><li>document以上所有</li></ul><h1 id="security-strategy-for-aws-iam-service"><a class="markdownIt-Anchor" href="#security-strategy-for-aws-iam-service"></a> security strategy for AWS IAM service</h1>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Best practise </tag>
            
            <tag> Security </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - AWS EMR best practise</title>
      <link href="2018/05/06/markdown/AWS/AWS2018/BestPractices_AWS_EMR/"/>
      <url>2018/05/06/markdown/AWS/AWS2018/BestPractices_AWS_EMR/</url>
      
        <content type="html"><![CDATA[<h1 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h1><p>Moving data to AWS --&gt; Data Collection --&gt; Data Aggregation --&gt; Data Processing --&gt; Cost and Performance Optimizations</p><h2 id="1-moving-data-to-aws"><a class="markdownIt-Anchor" href="#1-moving-data-to-aws"></a> 1. Moving data to AWS</h2><p>Means moving bulk of the existing data to AWS</p><ul><li>Moving Direction<ul><li>Local Storage &lt;–&gt; AWS S3<ul><li>Local HDFS --&gt; S3<ul><li>using S3DistCp: an extension of DistCp with optimizatios to AWS</li><li>using DistCp</li></ul></li><li>Local Filesystem --&gt; AWS S3<ul><li>opensourced tools support multi-threading: Jets3t / GNU Parallel</li><li>Aspera Direct-to-S3: a file transfer protocol based on UDP and with optimizations to AWS</li><li>Device based import/export</li><li>AWS Direct Connect<ul><li>One-time direct connection: once bulk data trasferred, stop the direct connection</li><li>On going direct connection: always connected</li></ul></li></ul></li><li>S3 --&gt; Local HDFS<ul><li>using S3DistCP or DistCP</li></ul></li></ul></li><li>AWS S3 --&gt; AWS EMR</li><li>AWS S3 --&gt; HDFS</li></ul></li><li>With good optimization: several Terabytes a day</li></ul><h2 id="2-data-collection"><a class="markdownIt-Anchor" href="#2-data-collection"></a> 2. Data Collection</h2><p>Means streaming data to AWS</p><ul><li>Apache Flume : collected data can be sent to S3, HDFS and more</li><li>Fluentd: collected data can be sent to S3, SQS , MongoDB, Redis and more</li></ul><h2 id="3-data-aggregation"><a class="markdownIt-Anchor" href="#3-data-aggregation"></a> 3. Data Aggregation</h2><p>Means, aggregated collected data at proper size before sending to target storage (S3, HDFS, EMR).</p><ul><li><p>Benefit of Aggregation</p><ul><li>aggregated files means less uploading times</li><li>aggregated files give better performance to Hadoop</li><li>aggregated files have better compression ratio</li></ul></li><li><p>How to Aggreate</p><ul><li>Apache Flume and Fluentd have parameters to support</li><li>S3DistCp has aggregation feature</li></ul></li><li><p>How to decide best aggregation size</p><ul><li>Background knowledge: how file is splitted<ul><li>If file is saved on HDFS, Hadoop will split the file into multiple data blocks and assign map task to each block</li><li>If file is saved in S3, EMR will use multiple get to get multiple data blocks of same file from S3 storage (default is about 64MB a block); but file size bigger than 4GB in GZIP format will result in logjams for EMR.</li><li>If the file is zipped , and with a format not supporting split (GZIP not supporting file splitting), then only 1 mapper task will be assigned</li></ul></li><li>Best practise 1: select correct aggregated file size<ul><li>GZIP 1-2GB</li><li>LZO (other format that support split): 2-4GB</li></ul></li><li>Best Practise 2: control the file size at best size<ul><li>it’s hard because most of the data collector only support time based file spliting. You might need to adjust the time to make sure the generated file fits in best size.</li></ul></li><li>Best Practise 3: select data compression algorithms<ul><li>If you choose GZIP format, then file should be around 1G, otherwise change to algorithms that support splitting</li><li>Why compression is good: less storage/IO/bandwidth</li><li>places that we can apply compression in MapReduce<ul><li>input file</li><li>mapper or reducer’s intermediate output: reduce the copy and reduce the data spill</li></ul></li><li>Summary, 4 considerations when choosing algorithm<ul><li>speed requirement for compress / decompress</li><li>data storage size</li><li>original aggregated file size (need split or not), if it’s too big , then have to select an algorithm that support splitting</li><li>native lib like gzip usually runs faster</li></ul></li></ul></li><li>Best Practise 4: Data Partitioning<ul><li>here means the organization of the data file should match with the way how you want to process it. For example, data analysised by date better been stored by date (different date, different directory)</li></ul></li></ul></li></ul><h2 id="4-data-processing-with-aws-emr"><a class="markdownIt-Anchor" href="#4-data-processing-with-aws-emr"></a> 4. Data processing with AWS EMR</h2><p>EMR will use HDFS or S3 to do mapreduce.</p><h3 id="picking-correct-instance"><a class="markdownIt-Anchor" href="#picking-correct-instance"></a> picking correct instance</h3><ul><li>Memory intensive job choose m prefix EC2 family</li><li>CPU intensive job choose C prefix EC2 family</li></ul><h3 id="picking-right-number-of-instances"><a class="markdownIt-Anchor" href="#picking-right-number-of-instances"></a> picking right number of Instances</h3><ul><li>If you want it faster, then make sure each split has a mapper task running in parallel, and depending on EC2 instance type, for example m1.small can run 2 mappers in parallel (m1.large can run 30 mappers), you can calculate the number of instances needed.</li><li>If you don’t care the speed, then just choose less number of instance, the tasks will be queued</li><li>EMR will charge by full hour.</li></ul><h3 id="estimating-number-of-mappers"><a class="markdownIt-Anchor" href="#estimating-number-of-mappers"></a> estimating number of mappers</h3><ul><li>Given the block size of S3 or HDFS, we know how a given file is splitted. Then we can calculate the mapper number needed.</li><li>Or the JobTracker GUI or output will tell</li></ul><h3 id="picking-emr-type"><a class="markdownIt-Anchor" href="#picking-emr-type"></a> picking EMR type</h3><ul><li>Transient EMR Cluster<ul><li>Suitable for data in S3; Processig is not all day long continious ; job is intensive ,iterative data processing</li><li>For machine learning, data load once , calculated multiple times.</li></ul></li><li>Persistent EMR Cluster</li></ul><h3 id="common-emr-patterns"><a class="markdownIt-Anchor" href="#common-emr-patterns"></a> Common EMR patterns</h3><ul><li>Data sits in S3 and EMR do not have local HDFS, it directly pull data from S3 and do map reduce</li><li>Data sits in S3 and EMR will copy to local HDFS and then do map reduce —&gt; Suitable for Machine Learning, copy once, calculate many times</li><li>Data sits in HDFS and backup &amp; result to S3 --&gt; suitable when data loss is acceptable</li><li>Manual Adjust the cluster by monitor: number of mappers running; number of mappers outstanding; number of reducers running; number of reducers outstanding;</li><li>Dynamic adjust the Cluster<ul><li>Master Node (JobTracker and NameNode); Core node(TaskTracker and DataNode); Task node (TaskTracker)</li><li>task node which do no contains HDFS storage can be add and removed dynamically</li><li>CloudWatch metrices can be used to decide how to adjust the cluster dynamically</li></ul></li></ul><h2 id="5-optimizing-cost"><a class="markdownIt-Anchor" href="#5-optimizing-cost"></a> 5. Optimizing Cost</h2><ul><li>&lt;17% planned use on-demand</li><li><blockquote><p>17% planned use reserved</p></blockquote><ul><li>light reserved for planned a few hours’ job</li><li>medium reserved</li><li>heavy reserved for all day round job</li></ul></li><li>unplanned short period and can wait : on spot instance<ul><li>the more the bit close to on-demand price, the more chance to get it done</li><li>If data is not in S3, then master node shouldn’t put on spot instance; core node better not put on spot instance ( at least put more in on-demand or reserved nodes).</li></ul></li></ul><h1 id="6-performance-optimization"><a class="markdownIt-Anchor" href="#6-performance-optimization"></a> 6. Performance Optimization</h1><ul><li>Data structure is key for Performance</li><li>Hadoop is batch-processing measured by hours to days, if you need improve time by a few minutes to meet SLA, then look at Storm or Spark</li><li>EMR charges on hourly increments.</li><li>make use of task node</li></ul><h2 id="benchmark-testing"><a class="markdownIt-Anchor" href="#benchmark-testing"></a> benchmark testing</h2><ul><li>eliminate variables while testing. For example, load data to HDFS if you are testing CPU and Memory, otherwise dataloading time from S3 might be the bottleneck.</li><li>Make use of Ganalia to monitor your EMR.</li></ul><p>Refer the original document for other Performance Test tips</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Best practise </tag>
            
            <tag> AWS EMR (Elastic MapReduce) </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Summarzie</title>
      <link href="2018/05/04/markdown/AWS/AWS2018/aws_summerize/"/>
      <url>2018/05/04/markdown/AWS/AWS2018/aws_summerize/</url>
      
        <content type="html"><![CDATA[<h1 id="service-saling-summary"><a class="markdownIt-Anchor" href="#service-saling-summary"></a> Service Saling Summary</h1><table><thead><tr><th>Service Name</th><th>Sacling/Failover capability</th><th>Comments</th></tr></thead><tbody><tr><td>AutoScaling</td><td>AZ --Yes ; Region – No</td><td>horizental scaling EC2 in same autoscaling group</td></tr><tr><td>Elastic Cache</td><td>Multi AZ failover;</td><td></td></tr><tr><td>Route53</td><td>Global Service</td><td></td></tr><tr><td>CloudFront</td><td>Global Service</td><td></td></tr><tr><td>VPC</td><td>Span Multi AZ; Region – No</td><td></td></tr><tr><td>RDB</td><td>Muti AZ; Multi Region (read replica)</td><td></td></tr></tbody></table><h1 id="calcuation"><a class="markdownIt-Anchor" href="#calcuation"></a> Calcuation</h1><p>EBS --&gt; GP2 typed SSD  --&gt; flexible IOPS</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Storage Gateway best practise</title>
      <link href="2018/04/26/markdown/AWS/AWS2018/BestPractices_AWSStorageGateway/"/>
      <url>2018/04/26/markdown/AWS/AWS2018/BestPractices_AWSStorageGateway/</url>
      
        <content type="html"><![CDATA[<h1 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h1><p>使用storage gateway帮助实现Hibrid Architecture</p><ul><li>使用gateway实现 NFS到Amazon S3的转换：翻译NAS协议到S3到API call</li><li>云端的AWS EMR或者Athena可以直接访问备份到S3的数据</li></ul><h1 id="architecture"><a class="markdownIt-Anchor" href="#architecture"></a> Architecture</h1><ul><li>Storage Gateway是virtual applicance</li><li>支持NFS v3.0 or 4.1</li><li>local storage is used to provide read/write cache</li><li><strong>&quot;Bucket Share&quot;</strong>: 一个share代表一个S3到NFS的mount point映射 （s3到bucket share是一对一的关系）</li><li>一个gateway至多支持10个bucket share</li></ul><h1 id="file-to-object-mapping"><a class="markdownIt-Anchor" href="#file-to-object-mapping"></a> File to object mapping</h1><ul><li>通常unix文件的读写权限（owner，group， permission）和时间戳会映射到s3d的object的metadata中</li></ul><h1 id="readwrite操作和本地cache"><a class="markdownIt-Anchor" href="#readwrite操作和本地cache"></a> Read/Write操作和本地cache</h1><ul><li>LRU（least recently used）算法用来evict data</li><li>Read Operation （read through cache）， 先读cache，没有则访问网络</li><li>Write Operation （write-back cache）， 先写cache（parallel writes到local），然后异步写变化的部分回网络</li></ul><h1 id="nfs-security-in-lan"><a class="markdownIt-Anchor" href="#nfs-security-in-lan"></a> NFS Security in LAN</h1><ul><li>Mounted file system will have default Unix permission defined</li><li>Can define based on source hostname or range of hostname (by define CIDR or by individual IP addresses)</li></ul><h1 id="monitoring-cache-and-traffic"><a class="markdownIt-Anchor" href="#monitoring-cache-and-traffic"></a> Monitoring Cache and Traffic</h1><ul><li>file gateway will provide statistical info to Amazon Cloudwatch metrics</li><li>cover: cache consumption, cache hits/misses, data transfer and read/write</li></ul><h1 id="file-gateway-bucket-inventory"><a class="markdownIt-Anchor" href="#file-gateway-bucket-inventory"></a> File Gateway Bucket Inventory</h1><ul><li>if one bucket is associated with more than one gateway, then gateway A don’t aware of changes happending in gateway B</li><li>if S3 object is modifed outside NFS share, gateway must RefreshCache (via API or CLI) to pick up the changes</li></ul><h1 id="bucket-shares-with-multi-contributors"><a class="markdownIt-Anchor" href="#bucket-shares-with-multi-contributors"></a> Bucket Shares with multi Contributors</h1><ul><li>there’s NO LOCKING or COHERENCY accorss file gatways</li><li>the change will be picked up<ul><li>if the object is only being listed but not yet been queried (not loaded) when change happens, then it will be loaded when first being queried</li><li>if the gateway “RefreshCache” API / CLI is executed</li></ul></li><li>Best Practise: one writer gateway, multi reader gateway<ul><li>there’s a “ready-only” mount option</li></ul></li></ul><h1 id="amazon-s3-and-file-gateway"><a class="markdownIt-Anchor" href="#amazon-s3-and-file-gateway"></a> Amazon S3 and file Gateway</h1><ul><li><p>AWS file gateway</p><ul><li>可以接 S3或者S3-IA</li></ul></li><li><p>S3和S3-IA都可以定义lifecycle policy： 基于object的创建时间或者tag的key value pair</p></li><li><p>S3</p><ul><li>小数点后9个9的高可用性</li><li>layer1，可以往S3-IA或者Glacier导</li></ul></li><li><p>S3 - IA (infrequence access)</p><ul><li>小数点后2个9的高可用性</li><li>适合存超过30天大约128KB的object</li></ul></li><li><p>Glacier</p></li></ul><h2 id="transition"><a class="markdownIt-Anchor" href="#transition"></a> Transition</h2><ul><li>object转到S3-IA的时候对于file gateway来说还是可见的（read/write）</li><li>object根据lifecycle转到Glacier后就只能list了，不可以读写。（I/O error）</li></ul><h2 id="结合s3的crr-cross-region-replication"><a class="markdownIt-Anchor" href="#结合s3的crr-cross-region-replication"></a> 结合S3的CRR (Cross Region Replication)</h2><h1 id="使用案例"><a class="markdownIt-Anchor" href="#使用案例"></a> 使用案例</h1><h2 id="cloud-tiering"><a class="markdownIt-Anchor" href="#cloud-tiering"></a> Cloud Tiering</h2><p>利用S3的灵活性，增强现有数据中心存储的durability；用多少花多少；无线的虚拟存储扩展；以及low latency via gateway</p><h2 id="hybrid-cloud-backup"><a class="markdownIt-Anchor" href="#hybrid-cloud-backup"></a> Hybrid Cloud Backup</h2><ul><li>gateway提供一个low latency的5T的object存储（？？？）。通过扩展就可以</li><li>常见： 30天内的数据S3， 30-60存s2-IA，超过60存Glacier， 超过1年删除。</li></ul><p>notes：</p><ul><li>设计的时候local cache要足够保存完整的backup（ 不然的话，从backup恢复本地cache的文件数据就只能从WLAN走S3获取）</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Best practise </tag>
            
            <tag> Storage Gateway </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Data Security</title>
      <link href="2018/04/23/markdown/AWS/AWS2018/036_DataSecurity/"/>
      <url>2018/04/23/markdown/AWS/AWS2018/036_DataSecurity/</url>
      
        <content type="html"><![CDATA[<h1 id="encryption-and-key-management-in-aws"><a class="markdownIt-Anchor" href="#encryption-and-key-management-in-aws"></a> Encryption and Key Management in AWS</h1><blockquote><p>Encryption and key management in AWS --2015<br><a href="https://youtu.be/uhXalpNzPU4" target="_blank" rel="noopener">https://youtu.be/uhXalpNzPU4</a></p></blockquote><h2 id="encryption-primer"><a class="markdownIt-Anchor" href="#encryption-primer"></a> Encryption Primer</h2><ul><li>Encrypt the data using <strong>Symetric key</strong> and store the encrypted data</li><li>Use <strong>Master Key</strong> to encrypt <strong>Symetric Key</strong> and store the encrypted key</li><li>Use <strong>Master master key</strong> to encrypt master key, and store the encrypted master key</li><li>…, these keys are called <strong>Key Hierarchy</strong>, and store in HSM<ul><li>Reduce the <strong>Blast Radius</strong> about losting a single key</li></ul></li></ul><h2 id="client-side-encryption"><a class="markdownIt-Anchor" href="#client-side-encryption"></a> Client Side Encryption</h2><ul><li>customer encrypt the data and manage key</li><li>For client side encrypted data targeting to store in S3, you can use AWS SDK to simplify the approach<ul><li>Using AWS SDK, your master key will be on premise, but symetric key and encrypted data will saved in S3</li></ul></li></ul><h2 id="server-side-encryption"><a class="markdownIt-Anchor" href="#server-side-encryption"></a> Server Side Encryption</h2><ul><li>AWS encrypt the data and manage key<ul><li>upload raw data via TLS to AWS (S3, Glacier, EBS, Redshift, RDS etc), then enable encryption<ul><li>Use AWS key, AWS will generate dynamic unique key for each object, then manage the key using aws s3 internal service</li><li>Use customer key, AWS will encrypt the data using customer’s key and throw away the key after encryption<ul><li>when request the encrypted data, you need to provide the key , aws decrypt the data and return it back</li></ul></li></ul></li></ul></li><li>For S3/EBS/RDS/Redshift server side encryption, you have 2 options,<ul><li>use S3/EBS/RDS/Redshift service master key (who ever have access to bucket will be able to decrypt the data)</li><li>use AWS KMS service, then you can specify which master key you want to use when encrypt the data</li></ul></li></ul><h2 id="key-management-options"><a class="markdownIt-Anchor" href="#key-management-options"></a> Key management Options</h2><ul><li>self manage</li><li>AWS Key Management Service<ul><li>Use API to generate the key, encrypted and plain text key will be returned<ul><li>Plaintext is used to encrypt the data, encrypted key is stored locally</li><li>when decryption needed, client need to submit the locally stored encrypted key</li><li>Master key is alwasy stored in KMS.</li><li>Benefit: KMS have more fine-grained access control, so encrypted data can only be decrypted by user who have access to the key.</li><li>Better auditing</li><li>Plain text never exist in any persist storage; AWS Service operate team is fully separated with KMS team; Multi-Party control</li></ul></li></ul></li><li>AWS Partner solutions<ul><li>Browse AWS marketplace for security</li></ul></li><li>AWS CloudHSM – HSM is hardware Security module<ul><li>A box used to store the keys</li><li>Only the user have access to the module</li><li>You can use offical cloudformation to provision it</li><li>support oracle/sql server( run on EC2 ) encrytion with HSM</li><li>support EBS storage encryption</li><li>support redshift (the only one )</li></ul></li><li>KMS vs HSM<ul><li>KMS underlyingly using HSM platform but not dedicated HSM</li><li>HSM is useful to comply government standards</li></ul></li></ul><h1 id="102mp4-103mp4-data-security"><a class="markdownIt-Anchor" href="#102mp4-103mp4-data-security"></a> 102.mp4 103.mp4 – Data Security</h1><ul><li><strong>Distributed DDOS attacks</strong></li><li><strong>Man In the Middle (MITM) attacks</strong> : 截获并替换证书，从而在中间修改消息内容。（如何防范： 避免信任self-signed证书）</li><li><strong>IP Spoofing</strong> : IP 地址欺骗 (???)</li><li><strong>Packet Sniffing</strong> : like what wiredshark can do</li></ul><p>==========================<br>X.509 certificate</p><p>HMAC-SHA256 protocol</p><p>cloud HSM??</p><p>IPsec VPN</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Data Security </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Direct Connect</title>
      <link href="2018/04/23/markdown/AWS/AWS2018/037_DirectConnect/"/>
      <url>2018/04/23/markdown/AWS/AWS2018/037_DirectConnect/</url>
      
        <content type="html"><![CDATA[<h1 id="104mp4-105mp4-aws-direct-connect"><a class="markdownIt-Anchor" href="#104mp4-105mp4-aws-direct-connect"></a> 104.mp4 105.mp4 – AWS direct connect</h1><ul><li>Dedicated network connection between on-primises network and AWS<ul><li>1 gigbit or 10 gigbit fiber</li><li>8.2.1q VLANS</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS Direct Connect </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Architecture Design</title>
      <link href="2018/04/22/markdown/AWS/AWS2018/035_ArchitectureDesign/"/>
      <url>2018/04/22/markdown/AWS/AWS2018/035_ArchitectureDesign/</url>
      
        <content type="html"><![CDATA[<h1 id="100mp4-101mp4-architecture-design"><a class="markdownIt-Anchor" href="#100mp4-101mp4-architecture-design"></a> 100.mp4 101.mp4 – Architecture Design</h1><h2 id="architecture-design"><a class="markdownIt-Anchor" href="#architecture-design"></a> Architecture Design</h2><ul><li><p>Security</p></li><li><p>Reliability</p></li><li><p>Performance</p></li><li><p>Cost Optimization</p><ul><li>Costs</li><li>Suboptimal Resources</li></ul></li><li><p>Operational Excellence</p><ul><li>Max Business value</li><li>Continious Improvement</li></ul></li><li><p>Production Scale Testing</p></li><li><p>Data-Driven Architecture</p></li><li><p>CHAOS MONKEY</p></li><li><p><strong>Forensic Clean</strong> ???</p></li><li><p><strong>WAF</strong> : web application firewall</p></li><li><p><strong>Penetration Testing</strong> : need to inform aws</p></li></ul><h2 id="design-principles"><a class="markdownIt-Anchor" href="#design-principles"></a> Design Principles</h2><ul><li><strong>Mechanical Sympathy</strong><ul><li><a href="https://github.com/jjfidalgo/mechanical-sympathy" target="_blank" rel="noopener">https://github.com/jjfidalgo/mechanical-sympathy</a></li></ul></li><li>Storage : select from Block , File , Object</li></ul><h2 id="cost-optimization"><a class="markdownIt-Anchor" href="#cost-optimization"></a> Cost Optimization</h2><ul><li><p>Analysis attribute expenditure</p></li><li><p><strong>AWS Trusted Adviser</strong> : feature</p></li><li><p>Runbook (how to run daily operations) and playbook (how to handle specific situation)</p></li></ul><p><a href="http://aws.amazon.com/architecture" target="_blank" rel="noopener">aws.amazon.com/architecture</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Architecture Design </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudTrail</title>
      <link href="2018/04/22/markdown/AWS/AWS2018/034_CloudTrail/"/>
      <url>2018/04/22/markdown/AWS/AWS2018/034_CloudTrail/</url>
      
        <content type="html"><![CDATA[<h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><p><a href="https://youtu.be/vtMCjyE5nms" target="_blank" rel="noopener">https://youtu.be/vtMCjyE5nms</a></p><h2 id="aws-cloudtrail-off-ir-incident-response-runbook"><a class="markdownIt-Anchor" href="#aws-cloudtrail-off-ir-incident-response-runbook"></a> AWS CloudTrail OFF IR (Incident Response) Runbook</h2><p>When someone want to turn off the CloudTrail, it will automatically being turned on and automated report and reminder being generated.</p><p>Automation Steps,</p><ol><li>Turn CloudTrail backon<ul><li>using python/lambda to handle the turn off event and turn it back on.</li></ul></li><li>Gather data related to “TURN OFF” incident</li><li>Extract principal, date, time, source IP from event data</li><li>Map principal who assumed the role</li><li>Lookup human contact info</li><li>Contact human provide guidance</li><li>Generate event summary for report</li></ol><h2 id="questionair-before-implementation"><a class="markdownIt-Anchor" href="#questionair-before-implementation"></a> questionair before implementation</h2><ol><li>What’s my expressed security objective in words</li><li>Is it configuration or behavior related ?</li><li>What data, where could help inform me ?</li><li>Do I have requisite ownership or visibility ?</li><li>What are my performance requirements ?</li><li>What mechanisms support the above ?</li><li>What is my expressed security objective in code?</li><li>Am I done?</li><li>Does a human need to look at this? When?</li></ol><h2 id="demo-s3putbucketpolicy-ir-runbook"><a class="markdownIt-Anchor" href="#demo-s3putbucketpolicy-ir-runbook"></a> Demo - S3:PutBucketPolicy IR Runbook</h2><p>When someone changed the S3 policy in a bad way, check the policy and restore it if needed.This runbook is making use of Stepfunction to implement it.</p><h2 id="demo-ec2-login-ir-runbook-user-login"><a class="markdownIt-Anchor" href="#demo-ec2-login-ir-runbook-user-login"></a> Demo - EC2 Login IR Runbook (User Login)</h2><p>When someone logged into the instance (as long as they have the key), check the login behaviour combining with other relavant data, then decide wether to isolate the instance or not.<br>This runbook is making use of cloudwatch event (cloudwatch will trigger a selfdefined json event when ec2 login action happens).</p><p>Identify</p><ol><li>Get the user</li><li>Gather relevant data</li><li>Terminate session</li><li>Isolate the instance</li><li>Report the incident</li></ol><p>Research</p><ol><li>Pull logs</li><li>Correlate other data</li><li>Report findings</li></ol><p>Forensics</p><ol><li>Memeory Dump</li><li>Create AMI and copy to forensics account</li><li>Launch instance &amp; Investigate &amp; Report findings</li></ol><h2 id="demo"><a class="markdownIt-Anchor" href="#demo"></a> Demo</h2><h1 id="cloudfront-deepdive-sec318"><a class="markdownIt-Anchor" href="#cloudfront-deepdive-sec318"></a> CloudFront Deepdive – SEC318</h1><h2 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h2><blockquote><p><a href="https://youtu.be/t0e-mz_I2OU" target="_blank" rel="noopener">https://youtu.be/t0e-mz_I2OU</a></p></blockquote><h2 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> overview</h2><ul><li>A Real Sample with who when what to whom and where,<ul><li>A Json format log: a user arn at xxx time made an api call “start log” to arn resource at sydney region and from xxx ip address with what kind of browser.</li></ul></li><li>Turn on Cloudtrail<ul><li>You can specify a new s3 or exising one.</li><li>You can turn it on from web console or aws cli</li><li>Two steps to turn it on: define the trail and start the trail</li></ul></li><li>Aggregate multiple accounts’ cloudtrial log into one bitbucket<ul><li>&lt;bucket_name&gt;/optional_prefix/AWSLogs/AccountId/CloudTrail/Region/YYYY/MM/DD/filename.json.gz</li></ul></li></ul><h2 id="monitoring-and-notification"><a class="markdownIt-Anchor" href="#monitoring-and-notification"></a> monitoring and notification</h2><ul><li>Help prevent <strong>high blast radius</strong> Behaviors</li><li>You can make use of pre-defined transformation<ul><li>Define cloudtrail to log api call into S3,</li><li>Define cloudwatch to receive the logs</li></ul></li><li>view the logs<ul><li>Cloudtrail console can be used directly</li><li>Use aws CLI</li></ul></li></ul><h2 id="encrpt-cloudtrail-log"><a class="markdownIt-Anchor" href="#encrpt-cloudtrail-log"></a> encrpt cloudtrail log</h2><ul><li>The cloudtrail log by default is encrypted at serverside (using SSE-S3)</li><li>You can chose to use KMS, create key , use key and assign decrypt access to log readers<ul><li>Create key, get the arn for the key;</li><li>Attach policy to the key</li><li>Attach policy to IAM user/group to be able to use the key to do decryption</li><li>Update the trail to use the key</li></ul></li><li>You can use same key for KMS key for multiple accounts<ul><li>When Attach the policy to the key , specify the key have access to encrypt trails belongs to multiple accounts ; then update trails for those accounts to use the key</li></ul></li></ul><h2 id="validating-the-file-integrity-of-cloudtrail-logs"><a class="markdownIt-Anchor" href="#validating-the-file-integrity-of-cloudtrail-logs"></a> Validating the file integrity of cloudtrail logs</h2><ul><li>Turn on by --enable-log-file-validation to trail<ul><li>hourly generte digest file signed by CloudTrail</li><li>digest file diliver to s3 certain folder &lt;bucket_name&gt;/optional_prefix/AWSLogs/AccountId/CloudTrail-Digest</li><li>use aws cli command line combine the digest file to validate the log file or use your own tool<ul><li>you need to have access to decrypt &amp; view the log and digest key to do so</li></ul></li></ul></li></ul><h1 id="098mp4-099mp4-cloudtrail"><a class="markdownIt-Anchor" href="#098mp4-099mp4-cloudtrail"></a> 098.mp4 099.mp4 – CloudTrail</h1><p>Logs:</p><ul><li>who (user)</li><li>when (Timestamp)</li><li>action (api call)</li><li>to whom (resource)</li><li>where (region)</li></ul><p>Helps : security , compliance , troubleshooting</p><ul><li>Persist into S3 as JSON format and using server side encryption.</li><li>integrate with CloudWatch logs (to generate alarms or to send to Kinesis) and SNS</li></ul><p>For CloudTrail logging for S3 objects,</p><ul><li>$0.1 per 100000 data events</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CloudTrail </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Elastic Cache hands on</title>
      <link href="2018/04/22/markdown/AWS/AWS2018/032_ElasticCacheHandson/"/>
      <url>2018/04/22/markdown/AWS/AWS2018/032_ElasticCacheHandson/</url>
      
        <content type="html"><![CDATA[<h1 id="093mp4-094mp4-elastic-cache-hands-on"><a class="markdownIt-Anchor" href="#093mp4-094mp4-elastic-cache-hands-on"></a> 093.mp4 094.mp4 – Elastic Cache hands on</h1><p>An important use scenario is to maintain application session state ( sesson replication)</p><ul><li>Create Security Group for Redis service as RedisSG.<ul><li>allow inbound 6379 port from webserverSG</li><li>allow outbound 6379 to everywhere</li></ul></li><li><strong>Cache subnet group</strong><ul><li>similiar like RDB subnet group.</li></ul></li><li>Using wizard to create Elastic Cache cluster<ul><li>option: enable replication ; enable multiAZ</li><li>instance type: cache.t2.micro</li><li>option: file location in S3 bucket</li><li>select cache subnet group using newly created; select security group using newly created;</li><li>optional : maintain window , SNS</li></ul></li><li>review the result ( 1 node )</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Elastic Cache </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Elastic Cache</title>
      <link href="2018/04/22/markdown/AWS/AWS2018/031_ElasticCache/"/>
      <url>2018/04/22/markdown/AWS/AWS2018/031_ElasticCache/</url>
      
        <content type="html"><![CDATA[<h1 id="elastic-cache"><a class="markdownIt-Anchor" href="#elastic-cache"></a> Elastic Cache</h1><ul><li>Managed in-memory cache service</li><li>key value stores</li><li>sub-mili sec latency to data</li><li>Redis / Memcached data store options</li><li>Multi-AZ capability</li><li>increase application throughput : 20M reads /sec ,4.8M write /sec</li><li>Scaling DB layer is much more expensive compared to scaling caching layer</li></ul><h2 id="compare-the-two-options"><a class="markdownIt-Anchor" href="#compare-the-two-options"></a> Compare the two options</h2><ul><li>depending on project language &amp; framework support</li><li>Redis feature is superset of Memcached</li></ul><h2 id="memcached-vs-redis"><a class="markdownIt-Anchor" href="#memcached-vs-redis"></a> Memcached Vs Redis</h2><ul><li>Store json<ul><li>in Memcached , use serialized string</li><li>in Redis , use hash</li></ul></li></ul><h2 id="memcached-mem-cache-d-store-option"><a class="markdownIt-Anchor" href="#memcached-mem-cache-d-store-option"></a> Memcached (mem cache d) store Option</h2><ul><li>Free and opensource</li><li>Object max size 1MB</li><li>Total max size 7 TiB</li><li>No persistence ; easy to adding node</li></ul><h2 id="redis-store-option"><a class="markdownIt-Anchor" href="#redis-store-option"></a> Redis Store option</h2><ul><li><p>Free and open Source</p></li><li><p>Object max size 512M</p></li><li><p>Total max size 3.5 TiB</p></li><li><p>persistence; read replica</p></li><li><p>Support Notification from Redis Pub/Sub channel</p></li><li><p>Support more data structures including : bitmaps, hyperlogs,<strong>GeoSpacial</strong> command;  geo indexes with radius queries ; and also those supported by Memcached</p></li><li><p>Support auto sorting of data</p></li><li><p>Support HA and Failover</p><ul><li>Failover is automatic, will chose the read replica with lowest latency and will switch the dns automatically</li></ul></li><li><p>API provided to query all read replica endpoints</p></li><li><p>Sharding : 16384 sharding slots (automatic client sharding; developer must use Redis Cluster Client )</p></li><li><p>Standard Redis use case</p></li><li><p>Leaderboard</p></li><li><p>Counters ; like &amp; dislike</p></li></ul><h3 id="redis-key-feature"><a class="markdownIt-Anchor" href="#redis-key-feature"></a> Redis key feature</h3><ul><li>Set : A group of objects with key. Each value has a unique key which will be unique.</li><li>Sorted Set: based on set but the keys are sorted</li><li>List: Push Pop ; like queue</li><li>Hashes: key</li></ul><h2 id="updating-cache"><a class="markdownIt-Anchor" href="#updating-cache"></a> Updating cache</h2><ul><li>Database triggers or Database via Lambda can be used to trigger update cache</li><li>application can be used to trigger cache updates</li></ul><h2 id="caching-strategies"><a class="markdownIt-Anchor" href="#caching-strategies"></a> Caching Strategies</h2><ul><li>Lazy Loading: load from cache if missed, load from DB and update cache</li><li>Write Through: write DB and then write cache</li><li>Adding TTL (memcache support seconds , Redis support both seconds or milliseconds)</li></ul><p>Elastic Cache vs Cloud Front<br>Elastic Cache — Designed to cache in memory , caching query (dynamic cache)<br>CloudFront — designed at Edge location , SDN</p><p>A <strong>Node</strong> is the smallest building block of an ElastiCache deployment.<br><a href="https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/WhatIs.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/WhatIs.html</a></p><p>Redis append-only files (AOF) ： Redis一种持久化的方式<br>Redis Multi-AZ with Auto Failover</p><h1 id="use-scenarios"><a class="markdownIt-Anchor" href="#use-scenarios"></a> Use Scenarios</h1><ul><li>Session Replication</li><li>IoT Device Data: make use of the 4.x million write / second capability<br><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/031_ElasticCache_IoT.png?raw=true" alt="IoT with Elastic Cache"></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/031_ElasticCache_IoT2.png?raw=true" alt="IoT with Elastic Cache - another use case "></p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/031_ElasticCache_IoT3.png?raw=true" alt="IoT with Elastic Cache -  use case 3 "></p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/031_ElasticCache_IoT4.png?raw=true" alt="IoT with Elastic Cache -  use case 4 "></p><ul><li>GEO advertising (Redis support GeoSpacial )</li></ul><h1 id="architecture"><a class="markdownIt-Anchor" href="#architecture"></a> Architecture</h1><ul><li>Master Node +  read replica (also as fail over node)</li><li>Cluster mode : data will be sharded</li></ul><h2 id="how-to-use-cluster-mode"><a class="markdownIt-Anchor" href="#how-to-use-cluster-mode"></a> how to use Cluster Mode</h2><ul><li><p>16384 hash slots for keys ; slots are distributed across cluster into shards</p></li><li><p>MUST USE <strong>Redis Cluster Client</strong></p><ul><li>the client will store a map with shards , like</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">shard1  = slot 1-3276</span><br><span class="line">shard2 = slot 3277- 6553</span><br></pre></td></tr></table></figure></li><li><p>a cluster can have <strong>max=15 shards; max 5 replicas</strong></p></li></ul><h2 id="how-to-migrate"><a class="markdownIt-Anchor" href="#how-to-migrate"></a> How to migrate</h2><ul><li>Migrate from single server mode to cluster mode</li></ul><blockquote><p>Create new cluster mode cluster --&gt; Snapshot old one --&gt; restore snapshot on new mode cluster–&gt; update client --&gt; delete old one</p></blockquote><ul><li>Migrate from cluster mode shard N to shard M</li></ul><blockquote><p>Create new cluster mode cluster --&gt; Snapshot old one --&gt; restore snapshot on new mode cluster–&gt; delete old one</p></blockquote><h1 id="tuning"><a class="markdownIt-Anchor" href="#tuning"></a> Tuning</h1><ul><li>set reserved memory to 30%</li><li>Swap should be 0 or very low ; if not scale up</li><li>read replica deploy to different AZ with master</li><li>take snapshot from read replica</li><li><strong>Odd Number</strong> of shards (support even number but not recommended)</li><li><strong>Russian Doll Caching</strong> : The technique of nesting fragment caches to maximize cache hits is known as russian doll caching. By nesting fragment caches, it ensures that caches can be reused even when content changes.</li><li><strong>Thundering Herd</strong> : sudden rise of large # of cache miss --&gt; Spike in database load<ul><li>App startup --&gt; script to populate cache</li><li>adding/removing nodes --&gt; Graduate scale nodes</li><li>key expiration (TTL ) --&gt; Randomize the title</li><li>out of cache memories --&gt; monitor cache evictions</li></ul></li><li>Failover requires updating DNS CName ; so careful with application caching the CName</li></ul><h1 id="caching-strategies-with-database"><a class="markdownIt-Anchor" href="#caching-strategies-with-database"></a> Caching Strategies with Database</h1><p>Option 1: Caching the row<br>Option 2: Caching transformed Object like json string<br>Option 3: Caching serialized application object<br>Option 4: Caching as Redis specific data type (like sorted set)</p><h1 id="hands-on"><a class="markdownIt-Anchor" href="#hands-on"></a> Hands on</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">https://github.com/mikelabib/elasticache-memcached-php-demo</span><br><span class="line"></span><br><span class="line">Step 1: Install php, apache, memcache client on the server.</span><br><span class="line">e.g. yum install php httpd php-pecl-memcache</span><br><span class="line"></span><br><span class="line">Step 2: Update /etc/php.ini file params:</span><br><span class="line">session.save_handler = memcache session.save_path = &quot;tcp://elasticache-memcache-node1-endpoint:11211,tcp://elasticache-memcache-node2-endpoint:11211, etc.&quot;</span><br><span class="line"></span><br><span class="line">Step 3: Configure php.d/memcache.ini param values:</span><br><span class="line">memcache.hash_strategy = consistent</span><br><span class="line">memcache.allow_failover = 1</span><br><span class="line">memcache.session_redundancy = 3</span><br><span class="line"></span><br><span class="line">Step 4: Restart httpd</span><br><span class="line">e.g. etc/init.d/httpd restart</span><br></pre></td></tr></table></figure><h1 id="cost"><a class="markdownIt-Anchor" href="#cost"></a> Cost</h1><ul><li>CrossAZ data transfer is free</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p><a href="https://youtu.be/e9sN15a7utI" target="_blank" rel="noopener">https://youtu.be/e9sN15a7utI</a></p></blockquote><blockquote><p>Deep Dive<br><a href="https://youtu.be/zmDUDSYnAv4" target="_blank" rel="noopener">https://youtu.be/zmDUDSYnAv4</a></p></blockquote><blockquote><p>Code Sample</p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Elastic Cache </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - High Available and Fault Tolerant Architecture</title>
      <link href="2018/04/19/markdown/AWS/AWS2018/030_HA_FailOver_Archi_Handson/"/>
      <url>2018/04/19/markdown/AWS/AWS2018/030_HA_FailOver_Archi_Handson/</url>
      
        <content type="html"><![CDATA[<h1 id="085mp4-ha-and-fault-torerant-architecture-hands-on-overview"><a class="markdownIt-Anchor" href="#085mp4-ha-and-fault-torerant-architecture-hands-on-overview"></a> 085.mp4 — HA and Fault Torerant Architecture hands on overview</h1><h2 id="086mp4-focusing-on-vpc"><a class="markdownIt-Anchor" href="#086mp4-focusing-on-vpc"></a> 086.mp4 – focusing on VPC</h2><h3 id="advanced-vpc-architecture"><a class="markdownIt-Anchor" href="#advanced-vpc-architecture"></a> Advanced VPC Architecture</h3><ul><li>The Advanced VPC Architecture can use CloudFormer to duplicate into different regions<ul><li><strong>Route53</strong> (Global) handle cross Region requests to IGW sitting in each region and the CloudFront Distribution<ul><li>Route53 is Global service</li></ul></li><li><strong>CloudFront</strong> (Global) caching source linked to S3 bucket.<ul><li>CloudFront is Global service</li></ul></li><li>2 <strong>VPC</strong> : sitting in different regions.<ul><li>VPC can’t span region</li></ul></li><li>2 <strong>S3</strong> : each region has 1 S3 bucket and using “Service Endpoint” linking to VPC in same rigion; one of the S3 bucket is used as the other S3’s replica.<ul><li>S3 can’t span region</li><li>S3 can connect to VPC via “service endpoint”</li><li>S3 can have replica in another region</li></ul></li><li>2 <strong>ELB</strong> : each region has one ELB, receiving request from IWG and banlance the request to Instances sitting accross AZs.<ul><li>ELB can span AZ and balancing request accross AZ</li><li>ELB can’t span Region</li></ul></li><li>4 <strong>Availability Zones</strong> : each region has 2 AZ. EC2 instances and Aurora DB services are splitted into 2 Regions and 2 AZs in each region.</li><li>2 <strong>AutoScaling Group</strong> : each region has 1 auto scaling group to contain the EC2 clusters. Each AutoScaling Group is spanning 2 AZs.</li><li>4 <strong>Public Subnet</strong>: each region has 1 VPC and each VPC has 2 public subnet sitting in different AZs.<ul><li>subnet can’t span AZs. (Subnet has property to specify its AZ id)</li><li>each of the public subset contains partial of the EC2 cluster ( which belongs to the AutoScaling Group)</li></ul></li><li>2 <strong>NAT Service</strong> : each region has 1 NAT service sits in one of the public subnet to provide NAT service to instances sitting in Private subnet.</li><li>4 <strong>Private Subnet</strong> : Each region has 2 private subnet sitting in different AZs. These subnet used to contain data services.<ul><li>subnet can’t span AZs.</li></ul></li><li>5 <strong>Aurora service</strong> : each AZ has at least one Aurora service.<ul><li>The main Aurora  and standby Aurora sits in one Region but splitted into different AZ.</li><li>One AZ has the main Aurora, and all the other AZ has the read replica.</li><li>Aurora support cross Region Read Replica<ul><li><a href="https://aws.amazon.com/blogs/aws/new-cross-region-read-replicas-for-amazon-aurora/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/new-cross-region-read-replicas-for-amazon-aurora/</a></li></ul></li></ul></li><li>2 <strong>DB Subnet Group</strong> : each region has one</li></ul></li></ul><h3 id="hands-on-by-creating-one-vpc-in-one-region-that-meet-the-advanced-vpc-archi-design"><a class="markdownIt-Anchor" href="#hands-on-by-creating-one-vpc-in-one-region-that-meet-the-advanced-vpc-archi-design"></a> hands on by creating one VPC in one region that meet the Advanced VPC archi design</h3><ul><li>Use wizard to create the VPC and then review the config<ul><li>choose the wizard to create VPC with 1 pubic subnet and 1 private subnet</li><li>select CIDR for each subnet and select same AZ for both subnet</li><li>Specify NAT instance type and key pair</li><li>Specify S3 Service Endpoint access level (none / public only / private only / both; full access / custom )</li></ul></li><li>understanding the Route Table being used / created in Public and Private Subnet<ul><li>public subnet的路由表，0.0.0.0/0指向IWG; Private subnet，0.0.0.0/0指向NAT的ENI</li><li>一条s3的请求指向特定的VPC（service endpoint）</li><li>一条本地VPC内部的局部路由</li><li>route table is explicitly associated to public subnet; route table is implicitly associated to private subnet</li></ul></li><li>Check默认的ACL ：allow inbound/outbound everything</li><li>Check the NAT service being created via wizard<ul><li>Virtulization is using paravirtual ( new EC2 instances are more using HVM )</li><li>NAT security group by defaut is allow all</li></ul></li><li>To fix above issues,<ul><li>Change the network interface to “not being deleted after termination”</li><li>terminate the current NAT instance and check the network interface’s status become “available”</li><li>create a new VPC security group to be used by NAT instance<ul><li>allow inbound http(s) from private subnet</li><li>allow inbound ssh from current client ip</li><li>allow outbound http(s) to anywhere</li><li>allow all inbound traffic from current security group (!!! ???)</li></ul></li><li>change the network interface’s security group to use the new security group we just created.<ul><li>VPC security group is binding with instance (attach to network interface is the same to attach to instance)</li></ul></li><li>review the VPC security group vs ACL (access control list)</li></ul></li></ul><h2 id="087mp4-going-on-with-re-create-the-nat-instance"><a class="markdownIt-Anchor" href="#087mp4-going-on-with-re-create-the-nat-instance"></a> 087.mp4 – going on with re-create the NAT instance</h2><ul><li>when creating the new NAT instance, search from community AMI with key word “NAT HVM” to select the existing Image</li><li>select the existing public subnet to contain this new NAT instance</li><li>Disable “Assign public IP” because we will attach existing network interface to it</li><li>Network Interfaces section: attach existing one to this new NAT instance</li><li>Select the newly created Security Group , the NAT security group</li><li>Review and launch the instance</li></ul><h2 id="088mp4-in-the-same-region-create-subnet-in-another-az-and-create-acl-for-all-subnets"><a class="markdownIt-Anchor" href="#088mp4-in-the-same-region-create-subnet-in-another-az-and-create-acl-for-all-subnets"></a> 088.mp4 – In the same region, create subnet in another AZ and create ACL for all subnets</h2><ul><li><p>Create the new private subnet sitting in Same VPC but different AZ. (design the size accordingly)</p></li><li><p>Create the new public subnet sitting in Same VPC but different AZ. (design the size accordingly)</p></li><li><p>Review the route table being created for both new Subnets</p><ul><li>the route table attached by default with public VPC is wrong , it’s the route table being used by private subnet; change it to the other one that routing internet traffic to iwg.</li></ul></li><li><p>Create new ACL called “Public NACL” which sits inside existing VPC.</p><ul><li>ACL rule has an number, it will be applied using Number sequence</li><li>allow inbound http(s) from internet,inbound ssh from client</li><li>1024-65535 (by ELB health check) from internet<ul><li><a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html</a></li></ul></li><li>allow outbound http(s) to internet, outbound  ssh to all private subnets, outbound 1024-65535 to <strong>internet</strong>, port 3306(mysql ) to all private subnets</li><li>associate newly created ACL to 2 public subnets</li></ul></li><li><p>Create new ACL called “private NACL” which sits inside existing VPC</p><ul><li>Allow inbound MySQL (3306) from both public subnets ; allow inbound ssh from both public subnets; <strong>inbound 32768-61000 from internet (NAT)</strong></li><li>Allow outbound http(s) to internet; allow <strong>outbound 32768-61000 (mysql response)</strong> to public subnets;</li><li>Associate newly created ACL to 2 private subnets</li></ul></li><li><p>the ACL inbound and outbound rule will have a default deny rule at the end.</p></li><li><p>If ping doesn’t work, check the ICMP protocol at security group and ACL level</p></li><li><p>If ssh doesn’t work, check ACL outbound protocol allow 32768-61000 to internet: which means the ssh respond to the ssh client.</p></li></ul><h2 id="089mp4-creating-autoscaling-group-in-existing-vpc-create-elb-to-dispatch-requests"><a class="markdownIt-Anchor" href="#089mp4-creating-autoscaling-group-in-existing-vpc-create-elb-to-dispatch-requests"></a> 089.mp4 – creating AutoScaling group in existing VPC ; create ELB to dispatch requests</h2><ul><li><p>Create security group for the web server EC2 instances</p><ul><li>inbound http(s) from internet; ssh from local client; all traffic from itself</li><li>outbound all traffic to anywhere</li></ul></li><li><p>Create Load Balancer (EC2 --&gt; Network &amp; Security --&gt; Load Balancers)</p><ul><li>select current VPC for this new Load balancer to sit in</li><li>select dispatch HTTP protocol from 80 to 80 port (if https select , we need to upload ssl cert to ELB)</li><li>select both of our public subnets as the dispatching targets</li><li>select newly created security group for web server to attach with</li><li>configure the health check which used by ELB</li><li>select none of the existing EC2 instance (because we will use AutoScaling group) and enable cross zone load balancing and connection draining</li></ul></li><li><p>Edit the load balancer’s default configs :</p><ul><li>(this configuration moved to ELB group): enable loadbalancer generated stickyness with expiration 60 seconds</li></ul></li><li><p>Create AutoScaling Group by create launch configuration &amp; AutoScaling Group</p><ul><li>create launch configuration<ul><li>select AMI (search for worldpress AMI)</li><li>configure the instance configurations (monitoring / role / script ), disable assign public ip</li><li>attach newly created web security group</li></ul></li><li>create AutoScaling group using existing launch configuration<ul><li>starting with 2 instances</li><li>select existing VPC</li><li>multi-select 2 public subnet</li><li>select “receive traffic from Elastic Load Balancer(s)” and select newly created ELB</li><li>Health check type select “ELB”</li><li>“configure scaling policies” :<ul><li>scale from 2-10;</li><li>scale up : create a cpu usage alarm to trigger increase ; wait 600 before allow another activity</li><li>scale down : similiar</li></ul></li><li>optional : add notification if scaling happened</li></ul></li><li>review the result<ul><li>2 web instances being created</li><li>scaling history showing the action being done.</li><li>review and edit ELB’s health check configuration (interval from 30 to 60 sec)</li></ul></li><li>terminate one webserver manually and check another one being launched automatically</li></ul></li></ul><h2 id="090mp4-rds-aurora-service-creation-and-configuration"><a class="markdownIt-Anchor" href="#090mp4-rds-aurora-service-creation-and-configuration"></a> 090.mp4 – RDS Aurora service creation and configuration</h2><ul><li>Create security group for RDB servcie<ul><li>allow inbound MySQL(3306) from webserver security group</li><li>allow outbound http(s) to internet</li></ul></li><li>Edit previous web security group<ul><li>allow outbound to newly created DB security group</li></ul></li><li>Switch to RDB service , create &quot;DB Subnet Group &quot; to launch multi AZ db instances<ul><li>create the group and select existing VPC</li><li>multi select our 2 private subnets</li></ul></li><li>Launch RDB instance<ul><li>select RDB type; enable is enable multizone; select db engine version; select instance type;</li><li>give instance name ;user and password;</li><li>select existing VPC and newly created DB subnet group</li><li>disable public access to the db instance</li><li>attach newly created DB security group</li><li>give database name and other config (port/encryption/backup and retention config/ maintainance config )</li></ul></li><li>review the result</li><li>create read replica for the database<ul><li>select newly created “DB subnet group” and select the AZ where we want the read replica sits (instead of select subnet, here we need to select the AZ which already being binded with subnet when we create the DB subnet group)</li></ul></li><li>review the result : the main database and read replica will have different endpoint , how to handle from web application<ul><li>we can’t use ELB to dispatch request to RDB</li><li>possible solution is create HAProxy</li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS Architecture </tag>
            
            <tag> Faut Torlerant Architecture </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Auto Scaling</title>
      <link href="2018/04/19/markdown/AWS/AWS2018/029_AutoScaling/"/>
      <url>2018/04/19/markdown/AWS/AWS2018/029_AutoScaling/</url>
      
        <content type="html"><![CDATA[<h1 id="083mp4-084mp4-auto-scaling"><a class="markdownIt-Anchor" href="#083mp4-084mp4-auto-scaling"></a> 083.mp4 084.mp4 – Auto Scaling</h1><ul><li><p>can scale up / down</p></li><li><p>only scale horizontal</p></li><li><p>can scale accross AZ (can’t scale accross region !)</p></li><li><p>params: min size; max size; desired capacity (init size)</p></li><li><p>“Launch configuration”</p><ul><li>AMI type; Instance Type; Key pair; Security Groups</li><li>Optional: spot instance bid pricing</li></ul></li><li><p>Autoscaling group</p><ul><li>unhealth instance will be terminted and replaced</li></ul></li><li><p>Scaling plans (?)</p></li><li><p>Scaling Policy</p><ul><li>how to trigger: Alarm + policy to decide how to scale</li><li>trigger what action:  ChangeInCapacity; ExactCapacity; PercentChangeInCapacity</li></ul></li><li><p>Scaling Policy Types:</p><ul><li>Simple Scaling</li><li>Step Scaling (new feature): allow small changes,like 20% more capacity when 40%&lt;CPU&lt;70%</li></ul></li></ul><p>AutoScaling Margin</p><ul><li>Happens during rebalancing when it’s become unbalanced between HA zones<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html</a></li></ul><p>Scaling options<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/scaling_plan.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/scaling_plan.html</a></p><p>Save cost<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html</a></p><p>PercentChangeInCapacity<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_PutScalingPolicy.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_PutScalingPolicy.html</a></p><p>The order of execution for scheduled actions ; time confliction<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/schedule_time.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/schedule_time.html</a></p><p>AutoScaling Lifecycle Hooks; Action Result: ABANDON / CONTINUE<br>自动起来的实例，可以给一定时间装软件搞配置，一切ready返回信号（abandon或者continue），如果是continue，这个实例就可以加入集群了。<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html</a></p><p>health check grace period<br>autoscaling检查新加入实例的健康状态之前必须等待的时间。（确保新实例完全ready再检查）<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html</a></p><p>AutoScaling Standby State<br>把怀疑有问题的实例设置成standby state状态（还属于集群的一部分），检查其状态，修好了归队。<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-enter-exit-standby.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-enter-exit-standby.html</a></p><p>Merge single zone autoscaling group into multi zone group<br>可以把一模一样的分布在不同AZ的集群merge到一起<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/merge-auto-scaling-groups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/merge-auto-scaling-groups.html</a></p><p>describe-scaling-activities command<br><a href="https://docs.aws.amazon.com/cli/latest/reference/autoscaling/describe-scaling-activities.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/cli/latest/reference/autoscaling/describe-scaling-activities.html</a></p><p>Error Msg: Autoscaling: 《》instance(s) are already running.<br>说明hit当初设置的集群实例数上限了。<br><a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/ts-as-capacity.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/autoscaling/ec2/userguide/ts-as-capacity.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Auto Scaling </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - ELB</title>
      <link href="2018/04/18/markdown/AWS/AWS2018/028_ELB/"/>
      <url>2018/04/18/markdown/AWS/AWS2018/028_ELB/</url>
      
        <content type="html"><![CDATA[<h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><p><strong>SSL Offloading</strong></p><p>example : POODLE issue. 62% ELB updated within 24 hours</p><p><strong>Proxy Protocol</strong></p><ul><li>for TCP layer</li></ul><blockquote><p><a href="https://www.52os.net/articles/PROXY_protocol_pass_client_ip.html" target="_blank" rel="noopener">https://www.52os.net/articles/PROXY_protocol_pass_client_ip.html</a></p></blockquote><p><strong>X-Forwarded-For</strong></p><ul><li>designed for HTTP (Application Layer)</li></ul><blockquote><p><a href="https://en.wikipedia.org/wiki/X-Forwarded-For" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/X-Forwarded-For</a></p></blockquote><h2 id="elb-architectures"><a class="markdownIt-Anchor" href="#elb-architectures"></a> ELB architectures</h2><ul><li>ELB has it’s own VPC – managed by AWS</li><li>ELB load balancing normally relys on Route 53<ul><li>Route 53 use Round robin to dispactch requests to ELB sits in different AZ</li><li>Route 53 can do health check to ELBs (150 sec to shift all traffic to healthy ELBs)</li><li>ELB DNS records will change over time ; Route53 shouldn’t point to ELB IP.<ul><li>ELB record changes when scale; it won’t change when failure (automatically handles)</li><li>When IP changes, it will be drained and quarantined for 7 days</li></ul></li></ul></li><li>ELB as a global service will automatically support multi AZ; ELB can support cross zone load balancing (need config change)<ul><li>solve the problem of traffic imbalance between zones</li><li>EC2 charge inter-zone traffic, but traffic introduced by classic and application ELB cross zone load balancing is free (for network ELB it’s not free)</li><li><a href="https://aws.amazon.com/elasticloadbalancing/faqs/#pricing" target="_blank" rel="noopener">https://aws.amazon.com/elasticloadbalancing/faqs/#pricing</a></li></ul></li></ul><h2 id="elb-load-balancer"><a class="markdownIt-Anchor" href="#elb-load-balancer"></a> ELB load balancer</h2><p>Classic ELB</p><table><thead><tr><th></th><th>TCP / SSL</th><th>HTTP / HTTPS</th></tr></thead><tbody><tr><td>network layer</td><td>layer 4</td><td>layer 7</td></tr><tr><td>connection</td><td>passing through</td><td>(optional: SSL Offloading) terminated &amp; pooled</td></tr><tr><td>payload</td><td>no modification</td><td>header might be modified</td></tr><tr><td>routing</td><td>Proxy Protocol</td><td>X-Forwarded-For header</td></tr><tr><td>algorithm</td><td>round-robin</td><td>Least-Outstanding-Requests (might combines with sticky session)</td></tr></tbody></table><p>New Application ELB with a extention on</p><ul><li>Path based routing (content based routing)</li><li>HTTP/HTTPs only</li><li>container</li><li>native Web Socket , HTTP/2</li></ul><p>Latest category:</p><p>Elastic Load Balancing supports three types of load balancers. You can select the appropriate load balancer based on your application needs. If you need flexible application management and TLS termination then we recommend you to use Application Load Balancer. If extreme performance and static IP is needed for your application then we recommend you to use Network Load Balancer. If your application is built within the EC2 Classic network then you should use Classic Load Balancer.</p><p>The Classic Load Balancer is ideal for simple load balancing of traffic across multiple EC2 instances, while the Application Load Balancer is ideal for applications needing advanced routing capabilities, <strong>microservices, and container-based architectures</strong>.</p><h2 id="new-feature-with-application-load-balancer"><a class="markdownIt-Anchor" href="#new-feature-with-application-load-balancer"></a> New feature with Application Load Balancer</h2><h3 id="content-based-routing"><a class="markdownIt-Anchor" href="#content-based-routing"></a> Content based routing</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/028_ELBContentBasedRouting.png?raw=true" alt="ELB content based routing"></p><ul><li>very important feature for micro services.</li><li>Before, if you have different services hosting by different clusters, you have to have multiple ELB running to routing the requests to each individual cluster.</li></ul><p>Shall we use single Application ELB for all as it can support content based routing?</p><ul><li>Consider the Blast Radios and Isolation</li><li>SPF and configuration nightmare</li></ul><h3 id="support-for-container-based-architecture-and-micro-service-architecture"><a class="markdownIt-Anchor" href="#support-for-container-based-architecture-and-micro-service-architecture"></a> Support for Container based architecture and micro service architecture</h3><ul><li>Muti port<ul><li>classic: one port one instance ; Application ELB: multi port one instance</li></ul></li><li>ECS can work with ELB seamlessly update the configuration</li></ul><h3 id="better-api-to-interact-with-application-elb"><a class="markdownIt-Anchor" href="#better-api-to-interact-with-application-elb"></a> better API to interact with Application ELB</h3><ul><li>Listener: protocol &amp; port<ul><li>one ELB can have min 1 and max 10 listeners</li><li>Rounting rules are defined based on listeners</li></ul></li><li>Target Groups : group of EC2 instances or Containers or micro services<ul><li>Target group can associated with autoscaling group</li></ul></li><li>Targets: EC2 instances or Containers or micro services<ul><li>single target can be registered with multi target groups</li></ul></li><li>Rules:<ul><li>provide link between listeners to target groups</li><li>limit from 10-100(keep changing) rules per ELB</li></ul></li><li>Delete protection (prevent delete via api)</li></ul><h3 id="better-support-for-websocket-http2-real-time-streaming"><a class="markdownIt-Anchor" href="#better-support-for-websocket-http2-real-time-streaming"></a> better support for WebSocket , HTTP2, Real-time streaming</h3><p>On by default:<br>WebSocket : like downloading something from tablet via http<br>HTTP2: better performance with supporting concurrent request<br>RealTime Streaming</p><h3 id="other-improvements"><a class="markdownIt-Anchor" href="#other-improvements"></a> other improvements</h3><ul><li>Native support for IPV6</li><li>improved Health check; it will return result when doing health check.</li><li>Fail Open : to prevent the wrongly configured health check which is too deep.<br><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/028_MultiZoneBenefit.png?raw=true" alt="image"></li><li>Running in 3 zones is cheaper if we want to garantee fail over with same capacity when 1 zone fails.</li><li>Cross zone for Application ELB is <strong>always</strong> enabled.</li></ul><h3 id="acm-integration"><a class="markdownIt-Anchor" href="#acm-integration"></a> ACM integration</h3><ul><li>Fully integrate with ELB (FREE)</li><li>Automatically Renew</li></ul><h3 id="web-application-firewall"><a class="markdownIt-Anchor" href="#web-application-firewall"></a> Web Application Firewall</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/028_WebApplicationFirewall.png?raw=true" alt="image"></p><h2 id="elb-health-check"><a class="markdownIt-Anchor" href="#elb-health-check"></a> ELB Health check</h2><ul><li>TCP health check or HTTP health check<ul><li>TCP health bar is very low</li><li>Try use HTTP with 2xx response if possible</li></ul></li><li>Customize frequency and failure threshholds</li><li>Consider the depth and accuracy of the health check<ul><li>If the check includes the backend DB, then 1 DB failure might result in all web server being labelled as unhealthy</li></ul></li></ul><h2 id="elb-offloading"><a class="markdownIt-Anchor" href="#elb-offloading"></a> ELB offloading</h2><ul><li>SSL protocols supported: TLS1.0 1.1 1.3 and SSLv3</li><li>“Server Order Preference”, used to negotiate the cipher. Server will use it’s own list and find the top one that client supports.<ul><li>So server can define the most preferred Cipher at the top of server’s list.</li></ul></li></ul><h2 id="elb-key-configurations"><a class="markdownIt-Anchor" href="#elb-key-configurations"></a> ELB key configurations</h2><ul><li>Idle Timeout<ul><li>Default 60 sec, allow 1 sec -&gt; 1 hour</li><li>recommended config : how long the customer willing to wait till they get results (before they click retry)</li><li>anything longer than expected should be designed as error.</li></ul></li></ul><h2 id="elb-metrics-integrate-with-cloudwatch"><a class="markdownIt-Anchor" href="#elb-metrics-integrate-with-cloudwatch"></a> ELB metrics, Integrate with Cloudwatch</h2><ul><li>ELB by default is 1 min granularity.</li></ul><p>Useful Metrics :</p><ul><li>HealthyHostCount; UnhealthyHostCount.<ul><li>make sure the timeout is specified carefully to make sure the unhealthy status is tagged correctly</li></ul></li><li>Latency : request sending out from ELB till response received by ELB<ul><li>Min / Max / Average Latency provided</li><li>Debugging individual request using <strong>Access Logs</strong>;</li></ul></li><li>SurgeQueue (max length =1024) and SpillOvers<ul><li>this is Old design ; and new idea is fail earlier</li><li>SurgeQueue is used to queue in requests ; spillovers are requests being rejected when surgequeue is full (receiving HTTP: 503 Service Unavailable or HTTP: 504 Gateway Timeout errors)</li></ul></li></ul><p>Updates with Application ELB</p><ul><li>Metrics can collect at target group level or ELB level</li><li>Application ELB removed surgequeue and recommend to use “rejectedCount” metrics</li><li>“<strong>Cloudwatch Percentiles</strong>”<ul><li>give me P99 — give me  responsetime for 99% of my customer.</li></ul></li><li><strong>Request Tracing</strong><ul><li>Append header X-Amzn-Trace-ID, used for analysis the request through all ELB (might happen when services have dependencies to each other) and log to S3.</li></ul></li></ul><h2 id="elb-integrate-with-autoscaling"><a class="markdownIt-Anchor" href="#elb-integrate-with-autoscaling"></a> ELB integrate with AutoScaling</h2><p>AutoScaling can scale based on Metrics collected by ELB</p><blockquote><p>Don’t just look at the peak time. Sometimes the AutoScaling is shrinking the cluster too much which also result in bottleneck.</p></blockquote><h2 id="elb-sticky-sessions"><a class="markdownIt-Anchor" href="#elb-sticky-sessions"></a> ELB sticky sessions</h2><ul><li>enable ELB to route a certain user always to same backend instance; if backend instance become unhealthy, then stickness can be moved to another instance</li><li>Application controlled: application can decide whether to include the cookie generated from ELB back to client or not</li><li>sticky session: provided but not recommended. Recommend to use Elastic Cache.</li></ul><h2 id="best-practise-trouble-shooting"><a class="markdownIt-Anchor" href="#best-practise-trouble-shooting"></a> Best Practise &amp; Trouble shooting</h2><blockquote><p>Issue: Tablet; Client or ISP cache the IP address of DNS, which result in when ELB IP changes , problem happens.</p></blockquote><p>To solve the problem,</p><ol><li>When register with Route53 use alias like <strong>*.mydomain.com</strong></li><li>When application needs to refresh to connection, append GUID at beginning to force the refresh resolve. like send the request to <a href="http://55152E66-3C6A-4F6D-B1B0-D05C506F0528.mydomain.com" target="_blank" rel="noopener">55152E66-3C6A-4F6D-B1B0-D05C506F0528.mydomain.com</a></li></ol><blockquote><p>Enalbe Access Log to trouble shooting</p></blockquote><p>Once enabled, every request is logged into S3 for analysis.(every 5 min or every hour)<br>Analysis using Splunk , EMR, Hive etc<br>Log is indexed using date but including ELB ip address<br>Log content have detailed info to dig into which url is response slowly.</p><blockquote><p>methodoloy when something failes</p></blockquote><p>Metigation ; Isolation ; Restore Redundency</p><h1 id="something-to-memorize"><a class="markdownIt-Anchor" href="#something-to-memorize"></a> Something to memorize</h1><p>ELB Access Log Naming Convention</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">s3://my-loadbalancer-logs/my-app/AWSLogs/123456789012/elasticloadbalancing/us-west-2/2014/02/15/123456789012_elasticloadbalancing_us-west-2_my-loadbalancer_20140215T2340Z_172.160.001.192_20sg8hgm.log</span><br></pre></td></tr></table></figure><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>Best Practise 2014</p></blockquote><p><a href="https://youtu.be/K-YFw9-_NPE" target="_blank" rel="noopener">https://youtu.be/K-YFw9-_NPE</a></p><blockquote><p>Best Practise 2016</p></blockquote><p><a href="https://youtu.be/qy7zNaDTYGQ" target="_blank" rel="noopener">https://youtu.be/qy7zNaDTYGQ</a></p><blockquote><p>081.mp4 082.mp4 - ELB Overview</p></blockquote><p>ELB controller service<br><a href="http://jayendrapatil.com/aws-elastic-load-balancing/" target="_blank" rel="noopener">http://jayendrapatil.com/aws-elastic-load-balancing/</a></p><p>ELB trouble shooting (METHOD_NOT_ALLOWED)<br><a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html</a></p><p>Extended reading<br><a href="http://jayendrapatil.com/aws-elastic-load-balancing/" target="_blank" rel="noopener">http://jayendrapatil.com/aws-elastic-load-balancing/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> ELB </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Backup and Disaster Recovery</title>
      <link href="2018/04/18/markdown/AWS/AWS2018/027_BackupNDisasterRecovery/"/>
      <url>2018/04/18/markdown/AWS/AWS2018/027_BackupNDisasterRecovery/</url>
      
        <content type="html"><![CDATA[<h1 id="aws-disaster-recovery"><a class="markdownIt-Anchor" href="#aws-disaster-recovery"></a> AWS Disaster Recovery</h1><p><a href="http://www.ecloudgate.com/Doc/DisasterRecovery_Overview" target="_blank" rel="noopener">http://www.ecloudgate.com/Doc/DisasterRecovery_Overview</a></p><p>Backup and Restore vs Pilot Light vs Warm Standby<br>cheap —&gt; Inexpensive<br>Slow ----&gt; quick<br>RPO high --&gt; Low which mean time to recover from high to low<br>RTO high --&gt; Low  which mean data loss time period from high to low</p><h1 id="079mp4-080mp4-backup-and-disaster-recovery"><a class="markdownIt-Anchor" href="#079mp4-080mp4-backup-and-disaster-recovery"></a> 079.mp4 080.mp4 - Backup and disaster recovery</h1><h2 id="rpo-and-pto"><a class="markdownIt-Anchor" href="#rpo-and-pto"></a> RPO and PTO</h2><p>两个用来定义灾备需求的重要参数。用于指导灾备技术的选择。</p><p>RPO: Recovery Point Objective. The age of files that must be recovered from backup storage for normal operations to resume if a computer, system, or network goes down as a result of a hardware, program, or communications failure.<br>The recovery time objective (RTO) is the maximum tolerable length of time that a computer, system, network, or application can be down after a failure or disaster occurs.<br><a href="https://whatis.techtarget.com/definition/recovery-point-objective-RPO" target="_blank" rel="noopener">https://whatis.techtarget.com/definition/recovery-point-objective-RPO</a><br><a href="https://whatis.techtarget.com/definition/recovery-time-objective-RTO" target="_blank" rel="noopener">https://whatis.techtarget.com/definition/recovery-time-objective-RTO</a></p><h2 id="full-backup-vs-incremental-backup"><a class="markdownIt-Anchor" href="#full-backup-vs-incremental-backup"></a> Full backup vs Incremental Backup</h2><ul><li>Full backup, good RTO, bad RPO</li><li>Incremental backup , Good RPO, slower RTO</li></ul><h2 id="redundant-array-of-inexpensive-disks-raid"><a class="markdownIt-Anchor" href="#redundant-array-of-inexpensive-disks-raid"></a> Redundant Array of Inexpensive Disks (RAID)</h2><ul><li>Deinition of Raid0 , raid 1</li></ul><h2 id="recovery-by-service"><a class="markdownIt-Anchor" href="#recovery-by-service"></a> recovery by service</h2><h3 id="ec2-recovery"><a class="markdownIt-Anchor" href="#ec2-recovery"></a> EC2 recovery</h3><ul><li>EC2 Image --&gt; S3</li><li>EBS -&gt; Snapshot incrementally to S3</li><li>Setup multiple EBS volumn as RAID1</li><li>EBS is automatically replicated within AZ</li></ul><h3 id="rds-recovery"><a class="markdownIt-Anchor" href="#rds-recovery"></a> RDS Recovery</h3><ul><li>Automatically : daily and retention 1 day (default) max 5 days; RPO is 5 min(??)</li><li>Manually Triggered: won’t be delete upon db deletion</li></ul><h3 id="dynamodb"><a class="markdownIt-Anchor" href="#dynamodb"></a> DynamoDB</h3><ul><li>no automatic backup</li><li>Via table export -&gt; s3. Can configure daily export and can configure how many percent of throughput used as backup.</li></ul><h2 id="s3"><a class="markdownIt-Anchor" href="#s3"></a> S3</h2><ul><li>to Glacier</li><li>Restore from Glacier need 3-5 hours</li></ul><h2 id="storage-gateway"><a class="markdownIt-Anchor" href="#storage-gateway"></a> Storage Gateway</h2><p>Hybrid Storage Solution</p><ul><li>File Gateway : S3, EFS</li><li>Volumn Gateway: iSCSI to S3, EBS</li><li>Tape Gateway : iSCSI Virtual Tape Libraries (VTL) to S3, Glacier/Tapes</li></ul><p>EBS replication and Raid (what’s the relationship, do we still need RAID if it’s already being replicated?)<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Backup and Disaster Recovery </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Deployment Service</title>
      <link href="2018/04/18/markdown/AWS/AWS2018/026_DeployService/"/>
      <url>2018/04/18/markdown/AWS/AWS2018/026_DeployService/</url>
      
        <content type="html"><![CDATA[<h1 id="077mp4-078mp4-deployment-overview"><a class="markdownIt-Anchor" href="#077mp4-078mp4-deployment-overview"></a> 077.mp4 078.mp4 - Deployment Overview</h1><ul><li><p>Infrastructure as code</p><ul><li>Cloudformation templates;Cloudformation designer;</li></ul></li><li><p>Continous Deployment</p><ul><li>CodeCommit</li><li>CodePipeline</li><li>ElasticBeanStalk</li><li>OpsWorks</li><li>Elastic Container Service (ECS)</li></ul></li><li><p>Application update: Prebaking AMI; in place update application; Disposable upgrade</p></li><li><p>Blue-Green upgrade</p><ul><li>Staged roll out ; need doubled resource; make use of Route53 Service</li></ul></li></ul><p>ELB and ElasticBeanStalk work together<br>auto create ELB config<br><a href="https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.elb.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.elb.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Deployment </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Route 53, CloudFront and S3 handson</title>
      <link href="2018/04/18/markdown/AWS/AWS2018/025_HandsonWithRoute53CloudFrontS3/"/>
      <url>2018/04/18/markdown/AWS/AWS2018/025_HandsonWithRoute53CloudFrontS3/</url>
      
        <content type="html"><![CDATA[<h1 id="073mp4-074mp4-handson-with-s3-and-route53"><a class="markdownIt-Anchor" href="#073mp4-074mp4-handson-with-s3-and-route53"></a> 073.mp4 074.mp4 – handson with S3 and Route53</h1><ul><li>purchase a domain</li><li>create a S3 bucket (use the domain name as bucket name)</li><li>Download a HTML5 website</li><li>update the website files to S3</li><li>Edit S3 configuration : “Static Website Hosting” --&gt; select the index.html</li><li>Click the website , verify the origional host name endpoint works</li><li>Enable versioning</li><li>Delete index.html then use version control to revert the deletion (by deleting the deletion marker)</li><li>upload a different version of index.html and then revert the update(by deleting the new version)</li><li>Edite the life cycle rules: for revious version, after 30 days to Glacier and after another 30 days, delete permenently.</li><li>create another S3 bucket (using the subdomain name with www. prefix)</li><li>instead of hosting website, for this new S3 bucket, select re-direct all request to another host (the domain name we purchased)</li></ul><h1 id="075mp4-bring-in-cloudfront"><a class="markdownIt-Anchor" href="#075mp4-bring-in-cloudfront"></a> 075.mp4 – Bring in CloudFront</h1><ul><li>CloudFront can front web ; and  RTMP (Real Time Message Protocol), 流媒体</li><li>Select CloudFront for Website<ul><li>Source is S3 Bucket domain name with content files in it</li><li>Origin Path used to filter out which directory need to be cached</li><li>select allowed Protocol(Http/Https) and methods</li><li>TTL (set max,min,default)</li><li>Distribution Settings (how many edge location need to distributed to )</li><li>CNames: put puchased domain name</li><li>set Logging and comment</li></ul></li><li>wait till status to “Deployed” , check the cached contents</li><li>use the CloudFront assigned domain name to visit the website</li><li>trigger a “Invalidate” request to certain file or folder that matched with your request list.</li></ul><h1 id="076mp4-bring-in-route53"><a class="markdownIt-Anchor" href="#076mp4-bring-in-route53"></a> 076.mp4 – Bring in Route53</h1><h2 id="demo1"><a class="markdownIt-Anchor" href="#demo1"></a> Demo1,</h2><ul><li>Route53 --&gt; Hosted zones</li><li>select the puchased domain and click “Create Record Set”</li><li>Because the website hosted inside aws, select “Use Alias”</li><li>Choose the target as the CloudFront url</li><li>hit the domain name from browser verify it works</li></ul><h2 id="demo2"><a class="markdownIt-Anchor" href="#demo2"></a> Demo2</h2><ul><li>delete the existing record set</li><li>create a new recordset without using Alias, pointing to a public IP which is the EC2 hosting a wordpress web application</li></ul><h2 id="demo3-failover"><a class="markdownIt-Anchor" href="#demo3-failover"></a> Demo3 : Failover</h2><ul><li>Change the routing policy from “simple” to “weighted”</li><li>Give “DevID-Main” a weight 100 (0-255)</li><li>Create another record set to pointing to CloudFront with id “DevID-FailOver” and weight 0</li></ul><h2 id="demo4-health-check"><a class="markdownIt-Anchor" href="#demo4-health-check"></a> Demo4: Health check</h2><ul><li>Delete other records only leave the one pointing to EC2 ip</li><li>Change the routing policy to “FailOver” and assign this target as “Primary”</li><li>Assiciate the record set with Health Check definition</li><li>Create another record poiting to CloudFront ; with routing policy to “Failover” and “Secondary”</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Route 53 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Route 53</title>
      <link href="2018/04/18/markdown/AWS/AWS2018/024_Route53/"/>
      <url>2018/04/18/markdown/AWS/AWS2018/024_Route53/</url>
      
        <content type="html"><![CDATA[<h1 id="rout53-overview"><a class="markdownIt-Anchor" href="#rout53-overview"></a> Rout53 Overview</h1><p>It’s the entry point of distributed infra at backend.</p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/024_Route53_DNS.png?raw=true" alt="DNS in nutshell"></p><p>History,</p><ol><li>static DNS</li><li>dynamic DNS<ul><li>same domain name, based on user source or other properites, assign different resolving result</li></ul></li><li>Policy based</li></ol><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h2><p><strong>Traffic Policy</strong> : Rules routing to points<br><strong>Traffic Policy Record</strong> : domain name with an applied traffic policy version.</p><p><strong>Hosted Zones</strong>:<br><strong>DNS Records</strong>:  route request to the correct endpoint address</p><h2 id="steps-to-use-route53"><a class="markdownIt-Anchor" href="#steps-to-use-route53"></a> Steps to use Route53</h2><ol><li>Register your domain name with Route53 or other registra (needs some process to update)</li></ol><ul><li>WHOIS query privacy protect</li><li>Can transfer between aws account</li></ul><ol start="2"><li>Create Hosted Zone</li></ol><ul><li>Option1: Create Public Hosted Zone<ul><li>For domain registered with AWS, Hosted Zone is created automatically.</li><li>It means you bind your domain name with AWS name servers.</li></ul></li><li>Option2: Create Private Hosted Zone<ul><li>Benefit: 1) any name 2) hide your server ip address by refer the domain name</li></ul></li></ul><ol start="3"><li><p>DNS Records</p><ul><li>Record Name<ul><li>you can have root domain as one record (<a href="http://yoursite.com" target="_blank" rel="noopener">yoursite.com</a>)</li><li>you can create subdomain as one record (<a href="http://www.yoursite.com" target="_blank" rel="noopener">www.yoursite.com</a>)</li><li>you can create wildcard domain as record ( <strong>*.yourwebsite</strong> )</li></ul></li><li>Record Type :<ul><li>NS : name resolve</li><li>IP</li><li>CName (pointing to other record name)</li><li>Alias – used to pointing to other aws services; ELB, Elastic Beanstalk; CloudFront (alternative domain name must match route53 hosted domain name); S3 (bucketname must match route53 hosted domain name)</li><li>MX Record: email</li><li>TXT record: email validation, web analytics; certificates</li></ul></li></ul></li><li><p>Delegate to Route53</p><ul><li>For Route53 registered domain, automatically created record with<ul><li>Record name = your domain Name</li><li>Record Type: NS</li><li>Value: <strong>delegation set</strong> list of AWS Route53 name servers specified for you (every customer has different set)</li></ul></li><li>For name with other registra, change from other registra’s console to update name servers to AWS’s <strong>delegation Set</strong></li></ul></li><li><p>Wait – for domain registered with other registra , wait 48 hours.</p></li></ol><h2 id="useful-tools-to-debug"><a class="markdownIt-Anchor" href="#useful-tools-to-debug"></a> Useful tools to debug</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> check <span class="keyword">if</span> the domain can be correctly resolved</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> dig google.com</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> Check <span class="keyword">if</span> the name server is correctly configured</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> dig NS google.com</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash"> trace the sequence of the name resolve result; from root to com to google.com</span></span><br><span class="line"><span class="meta">$</span><span class="bash"> dig NS google.com +trace</span></span><br></pre></td></tr></table></figure><h1 id="private-dns"><a class="markdownIt-Anchor" href="#private-dns"></a> Private DNS</h1><p>Option1,</p><ul><li>Set up unbound inside VPS as the forwarder</li><li>request from inside on-premise will go through forwarder to +2 resolver and routing to Route53 and then comming back</li></ul><p>Option2,</p><ul><li>Set up AWS Active Directory as the forwarder</li></ul><h1 id="advanced-features"><a class="markdownIt-Anchor" href="#advanced-features"></a> Advanced features</h1><h2 id="check-health"><a class="markdownIt-Anchor" href="#check-health"></a> Check health</h2><ul><li>check one endpoint ; check Cloudwatch alarms ; check other health checks;</li><li>by checking health to implement failover : Active/Active ; Active/Standby</li><li>New feature(2015): calculated health check, can combine multiple health check result (combine and, or calculation results).<ul><li>also support latency</li></ul></li></ul><h2 id="dynamic-route"><a class="markdownIt-Anchor" href="#dynamic-route"></a> Dynamic Route</h2><ul><li>Routing internet trafic to your aws resources (Simple; Latency based; GEO based; Weighted Round Robin; Failover)</li></ul><h2 id="traffic-flow"><a class="markdownIt-Anchor" href="#traffic-flow"></a> Traffic Flow</h2><ul><li>Graphic Tool to help define the Route53 policy</li></ul><h1 id="bills"><a class="markdownIt-Anchor" href="#bills"></a> Bills</h1><ul><li><p>DNS service</p><ul><li>Pays for hosted zones you configure and number of queries Route 53 answer</li></ul></li><li><p>Route53 service limites (50 domain per account)</p></li></ul><h1 id="use-cases"><a class="markdownIt-Anchor" href="#use-cases"></a> Use Cases</h1><p>Warner Bros: they have a lot of domains (&gt;25000), some zone has more than 10,000 records</p><ul><li>Set up more than 150 account to isolate application &amp; Bill and Security</li><li>Move from on-promise solution (Bind9). Suffer from HA, Failover,lack API (automation), lack Self Serice feature</li><li>Migrate key points<ul><li>Upper some of Rout53 limite (hosted zone per account, etc)</li><li>Plan &amp; Tools ; Tool to batch migrate, tool to validate</li><li>Lower the TTL during migration</li><li>Opensourced Tool : <strong>cli53</strong></li><li>Upfront investment in automation result in a smooth, error free migration.</li></ul></li><li>Review: <strong>Catchpoint</strong> DNS monitoring result, huge improvement</li></ul><h1 id="migrate-existing-dns-to-route53"><a class="markdownIt-Anchor" href="#migrate-existing-dns-to-route53"></a> Migrate existing DNS to Route53</h1><p>Correct Sequence to garantee <strong>availability</strong></p><ol><li>import DNS to Route53<ul><li>manually or use cli53</li></ul></li><li>Delegate to Route53<ul><li>after import each zone will have unique nameserver, use command line (registar specifed) to apply delegate to Route53 name server</li><li>dig +trace to verify the delegation is successful</li></ul></li><li>Transfer Domain to AWS Route53<ul><li>registar specific command to transfer the domain</li></ul></li></ol><h1 id="hybrid-environment"><a class="markdownIt-Anchor" href="#hybrid-environment"></a> Hybrid environment</h1><blockquote><p><a href="https://youtu.be/XXUYbdbCb6Q" target="_blank" rel="noopener">https://youtu.be/XXUYbdbCb6Q</a></p></blockquote><ul><li>Option 1, make use of <strong>unbound</strong> service as DNS forwarder (linux yum install)</li><li>Option 2, make use of <strong>AWS Active Directory</strong> service as forwarder</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><ul><li>071.mp4 072.mp4 – Overview</li></ul><blockquote><p><a href="https://youtu.be/QU7FQBgL0Po" target="_blank" rel="noopener">https://youtu.be/QU7FQBgL0Po</a></p></blockquote><ul><li>DNS Demystified: Amazon Route 53</li></ul><blockquote><p><a href="https://youtu.be/AAq-DDbFiIE" target="_blank" rel="noopener">https://youtu.be/AAq-DDbFiIE</a></p></blockquote><ul><li>Migrate to Route53 (??? Resolver forwarder)</li></ul><blockquote><p><a href="https://youtu.be/XXUYbdbCb6Q" target="_blank" rel="noopener">https://youtu.be/XXUYbdbCb6Q</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Route 53 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Cloudwatch</title>
      <link href="2018/04/17/markdown/AWS/AWS2018/022_Cloudwatch/"/>
      <url>2018/04/17/markdown/AWS/AWS2018/022_Cloudwatch/</url>
      
        <content type="html"><![CDATA[<h1 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> overview</h1><p><strong>Cloudwatch Metrics is a time series data store.</strong></p><p>Cloudwatch Alarms : based on Metrics ; Metrics Threshhold combined with Evaluation Period will decide wether to trigger alarm</p><ul><li>integrated with SNS, Email etc.</li><li>state: <strong>OK;ALARM;INSUFFICIENT_DATA</strong></li></ul><h2 id="integrate-cloudwatch-with-3rd-party-monitoring-platform"><a class="markdownIt-Anchor" href="#integrate-cloudwatch-with-3rd-party-monitoring-platform"></a> Integrate Cloudwatch with 3rd party monitoring platform</h2><p>Consider below carefully when integrate</p><ul><li>IAM Permission</li><li>API :  There is a limit of how many metrics being returned by one request. The number equals to if you have metrics with 1min period, then each time you can only retrive 1 day’s metrics. (24*60=1440 metrics).</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">aws cloudwatch list-metrics --metric-name EstimatedCharges</span><br><span class="line"><span class="meta">#</span><span class="bash"> period is <span class="keyword">in</span> second, so it<span class="string">'s 5 min</span></span></span><br><span class="line">aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization \</span><br><span class="line">      --dimentions Name=InstanceId,Value=i-30c9605  \</span><br><span class="line">      --start-time "2014-10-11T00:00:00Z" --end-time "2014-10-12T00:00:00Z" \</span><br><span class="line">      --period 300 \</span><br><span class="line">      --statistic  &#123;"average","maximum"&#125; | more</span><br></pre></td></tr></table></figure><ul><li>Request Through put: change the retrieving amount by adjusting start and stop and period make sure the response fit in requirement. (???)</li><li>Late Arriving data ( <strong>BackFill</strong> feature)</li></ul><h2 id="cloudwatch-logs"><a class="markdownIt-Anchor" href="#cloudwatch-logs"></a> Cloudwatch logs</h2><p>Centralized log (like ELK) for all AWS resources &amp; services.</p><p>Has throuput limitation of 1MBps.</p><ul><li><p>It’s built on <strong>Amazon Kenesis</strong></p><ul><li>Agent needs to be installed on server<ul><li>install using wget (a python script)</li><li>Run the agent by providing : what log folder needs to monitor;log group name (for similiar log from a cluster); stream name (like EC2 instance id); timestamp format ; reading for start or end of the log file when initiated.</li></ul></li><li>Monitor<ul><li>Once agent started, we can monitor from AWS console.</li><li>Set data expiration for log group (storage cost money)</li><li>Set Metric Filter against the log item to create metrics<ul><li>Filter Pattern : for example only filter out logs related to invalid user and create a metric based on that</li><li>support literal form; common log format ; json log (like CloudTrail)</li></ul></li><li>For existing metric, you can create an alarm (invalid user login 2 times per 5 min) and send to email address</li></ul></li><li>Access, arn for logs is : <strong>arn:aws:logs:<em>:</em>😗</strong><ul><li>No need to login the remote server but can tail the log using aws api (because it’s already centralized)</li></ul></li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">aws log pull --log-group-name /var/log/secure --log-stream-name i-30c960d1 | grep &quot;invalid user&quot;</span><br></pre></td></tr></table></figure></li><li><p>Cloudwatch Log source: EC2, CloudTrail, or other resources (like S3)</p></li><li><p>Use cases:</p><ul><li>monitoring for errors;</li><li>long term off box storage of logs;</li><li>tailing log without connecting to host;</li><li>correlate system status with change events. (correlate change with errors )</li></ul></li></ul><h3 id="sample-use-case-to-monitor-s3-logs"><a class="markdownIt-Anchor" href="#sample-use-case-to-monitor-s3-logs"></a> sample use case to monitor S3 logs</h3><ol><li>Turn on S3 log , then the log will be sent to another S3 bucket</li><li>From EC2 run python script to pull and send the log to CloudWatch</li><li>From CloudWatch , configure log metrics</li></ol><h3 id="sample-use-case-to-monitor-cloudtrail"><a class="markdownIt-Anchor" href="#sample-use-case-to-monitor-cloudtrail"></a> sample use case to monitor cloudtrail</h3><p>Fully integrated, no parsing needed, just enable cloudtrail , specify the target S3 bucket and enable cloudwatch integration.</p><h3 id="sample-use-case-to-integrate-with-ec2-configuration-service"><a class="markdownIt-Anchor" href="#sample-use-case-to-integrate-with-ec2-configuration-service"></a> sample use case to integrate with EC2 configuration service</h3><p>Windows servers can also be</p><h2 id="cloudwatch-events"><a class="markdownIt-Anchor" href="#cloudwatch-events"></a> Cloudwatch Events (???)</h2><p>Cloudwatch Event vs Cloudwatch Log , what’s the relationship</p><pre><code>* source: By status change or by CloudTrail event* Rules:  matching* target: SNS/SQS; Kinesis; Lambda(limited region to support Lambda)</code></pre><h1 id="bill"><a class="markdownIt-Anchor" href="#bill"></a> Bill</h1><p>$0.3 Per metric per month</p><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>066.mp4 067.mp4<br>068.mp4 - hands on with cloudwatch</p></blockquote><blockquote><p><a href="https://youtu.be/pTzv-i1uvvE" target="_blank" rel="noopener">https://youtu.be/pTzv-i1uvvE</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Cloudwatch </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Big Data Solution</title>
      <link href="2018/04/17/markdown/AWS/AWS2018/023_BigDataSolution/"/>
      <url>2018/04/17/markdown/AWS/AWS2018/023_BigDataSolution/</url>
      
        <content type="html"><![CDATA[<h1 id="069mp4-overview"><a class="markdownIt-Anchor" href="#069mp4-overview"></a> 069.mp4 – overview</h1><ul><li>Data Storage: Redshift; DynamoDB; S3 ; RDS</li><li>Data Analysis: EMR ; ElasticSearch; QuickSight BI; Amazon Machine Learning; Lambda</li><li>Data Streaming: Kinesis Streams</li></ul><h2 id="redshift"><a class="markdownIt-Anchor" href="#redshift"></a> Redshift</h2><ul><li>Petabyte level</li><li>PostgreSQL based</li><li>continously backed up to S3 with snapshots (1-35 days)</li><li>quick recovery from snapshots</li></ul><h2 id="emr-elastic-map-reduce"><a class="markdownIt-Anchor" href="#emr-elastic-map-reduce"></a> EMR (Elastic Map Reduce)</h2><ul><li>Fully managed hadoop service</li><li>Clusters can be automatically deleted upon task finish</li><li>Data processing framework: Hadoop Mapreduce &amp; Spark</li><li>Storage options:   HDFS /  EMRFS (S3 based) / EC2 local file system</li></ul><h2 id="elasticsearch"><a class="markdownIt-Anchor" href="#elasticsearch"></a> ElasticSearch</h2><ul><li>datasource: s3, Kinesis Streams, DynamoDB Streams, Cloudwatch logs, CloudTrail</li><li>Not suitable for Petabyte level storage</li></ul><h2 id="quicksight"><a class="markdownIt-Anchor" href="#quicksight"></a> QuickSight</h2><ul><li>BI Reporting tools</li><li>SPICE (Super-fast , Parrallel , In-memory, Calculation Engine)</li></ul><h1 id="amazon-machine-learning"><a class="markdownIt-Anchor" href="#amazon-machine-learning"></a> Amazon Machine Learning</h1><ul><li>Predictive Analytics</li><li>Datasource: redshift ; S3;  RDS (MySQL)</li><li>Supported Learning tasks: suspicious transactions; forecast product demands; personize content;predict user activity;analyze social media</li><li>has limitation on data set (not too large)</li><li>EMR to run Spark and MLlib</li></ul><h1 id="kinesis"><a class="markdownIt-Anchor" href="#kinesis"></a> Kinesis</h1><ul><li>Producer support : http put, sdk, c++ lib, java lib (Kinesis agent)</li><li>Consumer support : Java, nodejs, .net,python, ruby</li><li>“Kinesis Firehose” : streaming data into Kinesis Analytics, S3, redshift, elastic search</li></ul><p>Multi-zone Redshift design ; AZ and region consideration<br><a href="https://aws.amazon.com/blogs/big-data/building-multi-az-or-multi-region-amazon-redshift-clusters/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/big-data/building-multi-az-or-multi-region-amazon-redshift-clusters/</a><br><a href="https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html</a></p><p>Redshift Schema<br><a href="https://docs.aws.amazon.com/redshift/latest/dg/r_Schemas_and_tables.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/redshift/latest/dg/r_Schemas_and_tables.html</a></p><p>Kinesis Pricing : “Shard Hour” and “Put Payload Unit” (and “extended data rentention”)<br><a href="https://aws.amazon.com/kinesis/data-streams/pricing/" target="_blank" rel="noopener">https://aws.amazon.com/kinesis/data-streams/pricing/</a></p><p>Kinesis copy data into multi-zones<br><a href="https://aws.amazon.com/kinesis/data-streams/faqs/" target="_blank" rel="noopener">https://aws.amazon.com/kinesis/data-streams/faqs/</a></p><p>EMR persist cluster<br><a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-longrunning-transient.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-longrunning-transient.html</a></p><p>EMR Pricing (number of node * number of second (min 1 min))<br><a href="https://aws.amazon.com/emr/pricing/" target="_blank" rel="noopener">https://aws.amazon.com/emr/pricing/</a></p><h1 id="solution-pipeline-emr-redshift"><a class="markdownIt-Anchor" href="#solution-pipeline-emr-redshift"></a> Solution: Pipeline, EMR, Redshift</h1><h2 id="related-aws-services"><a class="markdownIt-Anchor" href="#related-aws-services"></a> Related AWS services</h2><ul><li>Collect : AWS direct connect; AWS Import/Export; AWS Kinesis</li><li>Store: S3; DynamoDB; Glacier</li><li>Process &amp; Analysis: EMR; Redshift; EC2</li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/023_BigDataSolutionOnAWS.png?raw=true" alt="sample big data solution on AWS"></p><h2 id="work-through-a-case"><a class="markdownIt-Anchor" href="#work-through-a-case"></a> Work through a case</h2><p>Step 1, sending log file to S3<br>Step 2, pipeline using pig script to parsing log as CSV<br>Step 3, pipeline to EMR (AWS hadoop)<br>Step 4, pipeline to Redshift</p><ul><li>Tips<br>Make use of json to define, build once, deploy many<br>Make use of <strong>backfill</strong> : Backfill is triggerred when you specify a scheduled start time to past. The pipeline might start multi concurrent tasks to try to catch up the data till now.</li></ul><h2 id="user-experience-coursera"><a class="markdownIt-Anchor" href="#user-experience-coursera"></a> User experience - Coursera</h2><ul><li>Use <strong>Dataduct</strong> for concise ETL definition (coursera opensourced tools)</li><li>prgrammically create pipeline</li><li>Extract re-usable steps</li><li>SQL-&gt;S3-&gt;EMR-&gt;S3-&gt;Redshift</li><li>Persist redshift log to<ul><li>tracking and generating who is responsible for which schema etc.</li><li>user knows how fresh the data is.</li></ul></li><li>Data quality : GIGO (Gabage in gabage out; best practise is the source system to fix the issue)</li><li>Automated QA Checks</li></ul><p>Benifit for using data pipeline</p><ul><li>Auto start and stop Resources</li><li>Handles access Management</li><li>integrate with aws services</li></ul><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><blockquote><p>Big data solution with pipeline, emr, redshift<br><a href="https://youtu.be/oOIgMSv2rug" target="_blank" rel="noopener">https://youtu.be/oOIgMSv2rug</a></p></blockquote><h1 id="solution-big-data-solution-for-justgiving-website"><a class="markdownIt-Anchor" href="#solution-big-data-solution-for-justgiving-website"></a> Solution :  big data solution for JustGiving website</h1><h2 id="challenges"><a class="markdownIt-Anchor" href="#challenges"></a> Challenges</h2><ul><li>Support various of datasource (api click; log; behaviour data, etc)</li><li>performance</li><li>ease of preparation of data</li></ul><h2 id="pain-dag-directed-acyclic-graph-like-a-data-flow"><a class="markdownIt-Anchor" href="#pain-dag-directed-acyclic-graph-like-a-data-flow"></a> Pain: DAG (Directed Acyclic Graph) , like a data flow</h2><p>Solve the pain:</p><ul><li>Use event-driven and severless pipeline<ul><li>separate storage with compute</li><li>use message, pub/sub patterns:<ul><li>Pattern 1: when data is ready, send notification to topic and trigger the data loading into redshift</li><li>Pattern 2: when data is ready, queuing EMR task into SQS, run EMR task from the queue to prcocess the data</li><li>Pattern 3: data present on Kinesis, processed by EMR, and send notification when finished</li><li>Pattern 4: S3-&gt; Lambda -&gt; S3 (suitable for small files)</li><li>Pattern 5: Serveless streamify small file and merge into larger file. S3-&gt; Kinesis-&gt; firehose-&gt; S3</li><li>Pattern 6: adding S3 to host static website with above analysis result</li></ul></li></ul></li><li>support ETL and ELT</li></ul><h2 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h2><blockquote><p>Big data solution with : S3, Lambda , Redshift, EMR, Kinesis<br><a href="https://youtu.be/YGNu6SLCk50" target="_blank" rel="noopener">https://youtu.be/YGNu6SLCk50</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> Big Data </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Simple Notification Service</title>
      <link href="2018/04/16/markdown/AWS/AWS2018/020_SimpleNotificationService/"/>
      <url>2018/04/16/markdown/AWS/AWS2018/020_SimpleNotificationService/</url>
      
        <content type="html"><![CDATA[<h1 id="061mp4-062mp4-overview"><a class="markdownIt-Anchor" href="#061mp4-062mp4-overview"></a> 061.mp4 062.mp4 – overview</h1><ul><li><p>Message being published to Topic via SDK/CLI/Console</p></li><li><p>Subscribed by : SQS (FIFO Queue Not supported); Email (format: Email, Email-JSON); Mobile; HTTP(s); Lambda; SMS</p></li><li><p>Message include,</p><ul><li>MessageId, Timestamp, TopicArn, Type, UnsubscribeUrl, MessageBody, Subject,Signature, SignatureVersion</li></ul></li><li><p>SNS Mobile Push Notification Steps,</p><ul><li>Request Credential from Mobile platforms</li><li>Request Token ( ADM, GCM registration ID; APNS device Token)</li></ul></li><li><p>ADM(Amazon Device Messaging) – push to kindle</p></li><li><p>APNS — push to iOS device</p></li><li><p>GCM — push to android</p></li></ul><p>High-level Steps<br><a href="https://docs.aws.amazon.com/sns/latest/dg/mobile-push-pseudo.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/sns/latest/dg/mobile-push-pseudo.html</a></p><p>Size limite for SMS (140byte per sms, 1600 byte per msg)<br><a href="https://docs.aws.amazon.com/sns/latest/dg/sms_publish-to-phone.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/sns/latest/dg/sms_publish-to-phone.html</a></p><p>HTTP(s) protocole: user password<br><a href="https://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> SNS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Simple WorkFlow Service</title>
      <link href="2018/04/16/markdown/AWS/AWS2018/019_SimpleWorkFlowService/"/>
      <url>2018/04/16/markdown/AWS/AWS2018/019_SimpleWorkFlowService/</url>
      
        <content type="html"><![CDATA[<h1 id="059mp4-060mp4-swf-overview"><a class="markdownIt-Anchor" href="#059mp4-060mp4-swf-overview"></a> 059.mp4 060.mp4 - SWF overview</h1><p>SWF: <strong>Simple WorkFlow Service</strong></p><ul><li>long running process</li><li>interact with aws, user, on-promise infrastructure</li><li>WorkFlow Engine</li><li>Workflow and sub WorkFlow</li><li>SWF Domain : one or multiple workflows</li><li>Actor:<ul><li>Starter</li><li>Decider</li><li>Worker</li></ul></li><li>Task:<ul><li>Register via console or CLI (RegisterActivityType)</li><li>Specify Queue for task</li><li>Use “Task Routing” for routing to specific worker</li></ul></li><li>Implementation / Set up<ul><li>Implementation : SDK; API Call ; framework (Java / Ruby)</li><li>Setup : CLI or console</li></ul></li><li>A scenario :<ul><li>worker upload video --&gt; transform --&gt; review --&gt; online</li></ul></li></ul><p>Steps to develop and run a WorkFlow<br><a href="https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html</a></p><p>SWF limitations (number of domains; request size; flow execution time)<br><a href="https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-limits.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-limits.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> SWF </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Simple Queue Service</title>
      <link href="2018/04/16/markdown/AWS/AWS2018/018_SimpleQueueService/"/>
      <url>2018/04/16/markdown/AWS/AWS2018/018_SimpleQueueService/</url>
      
        <content type="html"><![CDATA[<h1 id="058mp4-sqs-overview"><a class="markdownIt-Anchor" href="#058mp4-sqs-overview"></a> 058.mp4 – SQS Overview</h1><ul><li><p>up to 10 attributes can be add to a message</p></li><li><p>Size 1 - 256 K</p></li><li><p>Standard usage : provide cloudwatch metric (queue depth) to help  auto Scaling</p></li><li><p>Queue type</p><ul><li>Standard queue</li><li>FIFO (Max 300TPS, exact once) ; not available in all regions</li></ul></li><li><p>Message Lifecycle (Visibility Timeout)</p></li><li><p>Dead Letter Queue : must be in same region under same account with the source queue</p></li><li><p>Delay Queue</p><ul><li>Define “DelaySeconds”</li><li>max inflight msg = 120000</li></ul></li><li><p>Message Timers : individual message being available with a delayed manner. set “DelaySeconds” for individual message</p></li><li><p>Two type of polling</p><ul><li>short polling : 有消息吗？服务器答： 有/没有</li><li>Long polling: 有消息吗， 20秒内有消息回我，我等着。 服务器：好的。<ul><li>long polling减少无效的空返回 and false empty response (subset of servers)</li><li>set “WaitTimeSeconds” 1~20 second</li></ul></li></ul></li></ul><p>How the “DelaySeconds” change affect existing Message (different behavior for Standard and FIFO)<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html</a></p><p>Delete the Queue : what happens to existing Message<br><a href="https://docs.aws.amazon.com/cli/latest/reference/sqs/delete-queue.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/cli/latest/reference/sqs/delete-queue.html</a></p><p>Message Retention &amp; Visiblity timeout<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-limits.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-limits.html</a></p><p>Short Pooling definition : (Subset of servers)<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html</a></p><p>Queue identifier (format)</p><blockquote><p><a href="https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue" target="_blank" rel="noopener">https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue</a></p></blockquote><p><a href="https://sqs.regionname.amazonaws.com/accountnumber/queuename.uniqueforuser%5B.fifo%5D" target="_blank" rel="noopener">https://sqs.regionname.amazonaws.com/accountnumber/queuename.uniqueforuser[.fifo]</a><br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-general-identifiers.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-general-identifiers.html</a></p><p>Message ID vs Receipt Handle<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-general-identifiers.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-general-identifiers.html</a></p><p>How to check queue Depth (GetQueueAttributes &amp; ApproximateNumberOfMessages)<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_GetQueueAttributes.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_GetQueueAttributes.html</a></p><p>Dead letter queue limitations (origional queue type impact dead letter queue type; region &amp; account limitation)<br>FIFO queue’s dead letter queue will also be FIFO<br>Dead letter queue sits in same region with original queue<br>Dead letter queue must being created by same account of the original queue<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html</a></p><p>Cloudwatch integration with SQS (how often metrics are pushed; how to tell if a queue is active; no charge; support all queue types)<br>Every 5 min<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-monitoring-using-cloudwatch.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-monitoring-using-cloudwatch.html</a></p><p>Integration with CloudTrail (what will be loged)<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/logging-using-cloudtrail.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/logging-using-cloudtrail.html</a></p><p>Message Visiblity Range<br><a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ChangeMessageVisibility.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ChangeMessageVisibility.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> SQS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - OpsWorks</title>
      <link href="2018/04/15/markdown/AWS/AWS2018/017_OpsWorks/"/>
      <url>2018/04/15/markdown/AWS/AWS2018/017_OpsWorks/</url>
      
        <content type="html"><![CDATA[<h1 id="opsworks-overview"><a class="markdownIt-Anchor" href="#opsworks-overview"></a> OpsWorks Overview</h1><ul><li>Use Chef Recipes</li><li>Better and fine-controlled way of define infrastructure (Compare to Elastic Beanstalk)</li></ul><h2 id="cm-model"><a class="markdownIt-Anchor" href="#cm-model"></a> CM Model</h2><ul><li><p>CM Model (configuration management)</p><ul><li>Stack: a set of intances and applications</li><li>Layers: reusable subcomponent of stack</li><li>Instances: can participate multiple layer</li><li>Apps: codes running on server</li></ul></li><li><p>Scaling</p><ul><li>manuall Scaling</li><li>Automatic scaling : time based; load based</li><li>can be combined together</li></ul></li><li><p>Chef Recipes — infrastructure as code</p></li></ul><h1 id="deepdive"><a class="markdownIt-Anchor" href="#deepdive"></a> DeepDive</h1><ul><li><p>Difference with Chef Server, (there’s no chef server)</p><ul><li>Can be agentless (push model)</li></ul></li><li><p>Push Json format event to define to target status for each of the lifecycle of the server.</p><ul><li>Setup event</li><li>Config event</li><li>Deploy event</li><li>Undeploy event</li><li>Shutdown event</li></ul></li></ul><h1 id="opsworks-hands-on"><a class="markdownIt-Anchor" href="#opsworks-hands-on"></a> OpsWorks hands on</h1><ul><li>Create first stack</li><li>Create sample stack</li><li>Check the git repo for application and git repo for infrastructure</li><li>quicker than ElasticBeanStalk or container — high end options</li></ul><h1 id="improvements"><a class="markdownIt-Anchor" href="#improvements"></a> improvements</h1><p>Separated env for AWS chef recipe and customer recipe to avoid conflicts</p><p>First 10,000 metrics$0.30<br>$3/dashboard/month<br>Regular (5 min)$0.10/alarm<br>$0.50/GB</p><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>Opsworks 2015 under the hood</p></blockquote><blockquote><p><a href="https://youtu.be/WxSu015Zgak" target="_blank" rel="noopener">https://youtu.be/WxSu015Zgak</a></p></blockquote><blockquote><p>054.mp4 055.mp4 056.mp4</p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> OpsWorks </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - ElasticBeanStalk</title>
      <link href="2018/04/15/markdown/AWS/AWS2018/016_ElasticBeanStalk/"/>
      <url>2018/04/15/markdown/AWS/AWS2018/016_ElasticBeanStalk/</url>
      
        <content type="html"><![CDATA[<h1 id="052mp4-053mp4-elasticbeanstalk"><a class="markdownIt-Anchor" href="#052mp4-053mp4-elasticbeanstalk"></a> 052.mp4 053.mp4 – ElasticBeanstalk</h1><p>Help to provision resource to run application like: Docker ; NodeJs,java, etc;<br>With pre-configured failover and loadbalancing options.</p><p>Admin Access<br><a href="http://jayendrapatil.com/aws-root-access-enabled-services/" target="_blank" rel="noopener">http://jayendrapatil.com/aws-root-access-enabled-services/</a></p><p>No additional Charge</p><p>Beanstalk Integrated DB definition<br><a href="https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.db.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.db.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> ElasticBeanStalk </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudFormation</title>
      <link href="2018/04/12/markdown/AWS/AWS2018/14_CloudFormation/"/>
      <url>2018/04/12/markdown/AWS/AWS2018/14_CloudFormation/</url>
      
        <content type="html"><![CDATA[<h1 id="cloudformation-overview"><a class="markdownIt-Anchor" href="#cloudformation-overview"></a> CloudFormation Overview</h1><p>JSON format definition about what service needs to be deployed<br>Infra as code</p><ul><li><p>Support JSON or YAML format</p><ul><li>AWSTemplateFormatVersion : Format Version</li><li>Description</li><li>Parameters : define some params at stack creation time</li><li>Mappings: some predefined key/value pair, e.g, can be used to refer to value by region in resource def section</li><li>conditions , for example , if envcode=prod, then mount the disk when define the ec2 instance</li><li>“Resources” is the only mandatory section in CloudFormation def.</li><li>Outputs: can be used to be imported into other stack or report/show on console</li></ul></li><li><p>CloudFormer</p><ul><li>only support JSON</li><li>export current account’s selected service into cloudformation definition</li><li>Visual Tool to edit the cloudformation json template</li></ul><p>Sensitive Parameters (NoEcho)<br><a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html</a></p><p>Automatic Roll Back on Failure<br><a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html</a></p></li></ul><h1 id="049mp4-050mp4-051mp4-hands-on-cloudformation"><a class="markdownIt-Anchor" href="#049mp4-050mp4-051mp4-hands-on-cloudformation"></a> 049.mp4 050.mp4 051.mp4 - hands on cloudformation</h1><ul><li><p>Create cloudformation template using JSON editor</p><ul><li>manually add “AWSTemplateFormatVersion”,“Description”,“Parameters”,“Resources”,“Outputs”</li><li>pre-defined copied: define a DynamoDB with<ul><li>Parameters: ReadCapacityUnit and WriteCapacityUnit</li><li>Resources: DynamoDB Table with attributes definitions</li><li>Outputs: print out the table name</li></ul></li><li>from AWS console , from “CloudFormation” service main page, select “create a new stack”<ul><li>template can from sample / upload one to S3 / from a URL (hosted in S3)</li><li>Roll back on failure is “On” by default</li></ul></li><li>swith to DynamoDB console, and check the table is created as expected.</li></ul></li><li><p>Create Cloudformation using CloudFormer</p><ul><li>Create a stack and select “template from sample” and choose the “Cloudformer” template<ul><li>give user and password to login CloudFormer (took 8 min to create from video)</li><li>output contains the url of the CloudFormer webpage</li></ul></li><li>Login CloudFormer<ul><li>create template</li><li>go through the wizard to select from existing resources as template to form the new template</li><li>note: it won’t define paramteters (directly copy from the existing resources)</li></ul></li><li>Launch the stack using the template created by CloudFormer</li></ul></li></ul><h1 id="cloudformation-desinger"><a class="markdownIt-Anchor" href="#cloudformation-desinger"></a> CloudFormation Desinger</h1><ul><li>Visualize the Design</li><li>Can be used to update existing stack</li></ul><h1 id="extend-with-custom-resources"><a class="markdownIt-Anchor" href="#extend-with-custom-resources"></a> Extend with custom resources</h1><p>Capability to create resource that implement aws defined create/update/rollback/delete and metadata</p><p>Standard Senario: use lambda</p><h1 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h1><p>Policy example</p><ul><li>limit an user to only have access to create stack using template from certain bucket</li><li>limit a user to only can update a stack using a specific yml from a certain bucket</li></ul><p>IAM policy example</p><ul><li>limit type of resources the user can create</li></ul><h1 id="best-practise"><a class="markdownIt-Anchor" href="#best-practise"></a> Best Practise</h1><ul><li>reuse accross Region</li><li>Pseudo parameters (make use of the environment parameters)</li><li>Use Mappings</li><li>Use Conditionals</li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>047.mp4 048.mp4</p></blockquote><blockquote><p>CloudFormation Designer<br><a href="https://youtu.be/fVMlxJJNmyA" target="_blank" rel="noopener">https://youtu.be/fVMlxJJNmyA</a></p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CouldFormation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - EFS</title>
      <link href="2018/04/12/markdown/AWS/AWS2018/13_EFS/"/>
      <url>2018/04/12/markdown/AWS/AWS2018/13_EFS/</url>
      
        <content type="html"><![CDATA[<h1 id="045mp4-elastic-file-system-is-a-nas"><a class="markdownIt-Anchor" href="#045mp4-elastic-file-system-is-a-nas"></a> 045.mp4 – Elastic File System : is a NAS</h1><ul><li>EFS a Network Attached Storage</li><li>EFS is a NAS (Network Attached Storage; a File system) ; S3 &amp; Glacier is a webstore  ; EBS is a block</li><li>EFS can be shared by multi EC2 instances</li><li>EFS can grow and can shink ; Throughput scales automatically</li><li>Pay as you go (no minimum)</li><li>As a NAS, it support thousands of connections</li><li>Multi AZ replication</li></ul><p>After being mounted as Mount Target, it will be shown as a network storage resource.</p><h2 id="security-control"><a class="markdownIt-Anchor" href="#security-control"></a> Security Control</h2><h1 id="046mp4-handson-on-efs"><a class="markdownIt-Anchor" href="#046mp4-handson-on-efs"></a> 046.mp4 – Handson on EFS</h1><ul><li>mount EFS into certain subnet in VPC (has to be a subnet)<ul><li>Create EFS</li><li>Launch an EC2 instance</li><li>Security setting</li></ul></li></ul><h2 id="from-qa"><a class="markdownIt-Anchor" href="#from-qa"></a> From Q&amp;A</h2><ul><li>To load data from on-premise to EFS, you can use SCP (Secure Copy)</li><li>For old classic EC2 not inside VPC, you can use ClassicLink to connect to EFS</li><li>On-premise server can connect to EFS via AWS Direct Connect</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> EFS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - DynamoDB</title>
      <link href="2018/04/11/markdown/AWS/AWS2018/10_DynamoDB/"/>
      <url>2018/04/11/markdown/AWS/AWS2018/10_DynamoDB/</url>
      
        <content type="html"><![CDATA[<h1 id="dynamodb-deepdive"><a class="markdownIt-Anchor" href="#dynamodb-deepdive"></a> DynamoDB Deepdive</h1><ul><li>RDB optimized for storage; nonsql optimized for compute</li><li>DynamoDB supports both Key-value and document data models</li></ul><h2 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h2><p>Keys:</p><ul><li><p><strong>Partition Key</strong>; <strong>Hash Key</strong>;</p></li><li><p><strong>Sort Key</strong>;</p></li><li><p><strong>Range Key</strong> ;</p><ul><li><strong>Local secondary index (LSI)</strong></li><li><strong>Global secondary index (GSI)</strong> (max 5 GSI per table)</li><li>If data size &gt;10G use GSI</li><li><strong>Primary Key</strong> = Partition key + Sort Key</li></ul></li><li><p>Attribute;</p></li><li><p><strong>composite attributes</strong> ; <strong>composite key</strong> : a way of construct partition key</p></li><li><p><strong>Partition Key</strong> used to decide which partition it belongs to (uidling unordered hash index)</p></li><li><p>A Partition has 10G limit, if total storage with same partition key exceeded the limit, then <strong>sort key</strong> is used.</p></li><li><p>Table --&gt; Items --&gt; Attributes --&gt; Partition Keys (Mandatory attribute); Sort Key (Optional)</p></li><li><p>DynamoDB each partition will have totally 3 copies (including itself); when write, you will get success response when 2 writes succeed.</p></li><li><p><strong>Hash Range Table</strong> : a table where Hashkey+RangeKey to identify an item.</p></li></ul><h2 id="scaling"><a class="markdownIt-Anchor" href="#scaling"></a> Scaling</h2><ul><li>Scaling on <strong>throughput</strong> : WCU and RCU (CU is Capacity Unit)<ul><li>Partition needed = Roundup((total RCU/3k)+(total WCU/1k))</li></ul></li><li>Scaling on size (maxsizeperitem=400kb, maxsizeperpartition=10G)</li><li>Final partition = ceiling( ScalingByThrougput, ScalingBySize)</li><li>heat map – showing by time and partition dimention about which data being requested. If all data access is focused from a specific partition, then we got <strong>Hot Keys</strong> which we should avoid by re-design the paritition key.</li><li>DynamoDB <strong>Burst capacity</strong> is built in</li></ul><h1 id="data-modeling"><a class="markdownIt-Anchor" href="#data-modeling"></a> Data Modeling</h1><p>Store the data how you will access it.</p><ul><li>New feature since 2015: Support documents (JSON)</li></ul><h1 id="patterns"><a class="markdownIt-Anchor" href="#patterns"></a> Patterns</h1><ul><li><p>Use Lambda as DynamoDB Stored Procedure</p></li><li><p>Real Time Voting</p><ul><li>Write Sharding (Add random key to make sure data is spread into multi partition)</li></ul></li><li><p>Event logging – Don’t mix hot data and cold data</p><ul><li>Time series table (static ttl time stamps)<ul><li>Archive cold data to S3</li><li>precreate daily,weekly monthly tables and provision accordingly to current table</li></ul></li><li>Time series table (Dynamic ttl time stamps) – data being updated<ul><li>have a GSIKey to label recent updated data (??? 36min:36s)</li><li>Put GSI key as partion key and update timestamp as sort key</li><li>using lambda to filter out expired data and rotate into data lake</li></ul></li></ul></li><li><p>Product Catelog - Black Friday</p><ul><li>Use cache, and logic of updating cache can be implemented lambda</li></ul></li><li><p>Common Pattern - Online Gaming (filter)</p><ul><li>Problem: filter against non sort key. The engine will read all qualified records<ul><li>Solution 1: create composite key (in mongodb it’s called combind index)</li><li>Solution 2: <strong>Sparse indexes</strong></li></ul></li></ul></li><li><p>Messaging App - Mixed small and large attributes — Wrong</p><ul><li>large attribute will consume more RCU</li><li>Solution : split the table. Separate metadata and message body into different table. Provision RCU accordingly.</li></ul></li><li><p>Sparse Index – index created on optional attribute</p><ul><li>for example game-score-table has a column called award. When we want to filter by award, we want a Award GSI being created althrough this field has high chance to be null.</li></ul></li></ul><h2 id="mvcc-transaction-model-in-nonsql-db"><a class="markdownIt-Anchor" href="#mvcc-transaction-model-in-nonsql-db"></a> MVCC – Transaction model in nonsql db</h2><ul><li>MultiVersion Concurrency<ul><li>For example, creating partition locks.<ul><li>Manage versioning across items with metadata</li><li>Tag attributes to maintain multipe versions</li><li>Code your app to recognize when updates in progress</li><li>App layer error handling and recovery logic</li></ul></li></ul></li></ul><h1 id="new-feature-dynamodb-streams"><a class="markdownIt-Anchor" href="#new-feature-dynamodb-streams"></a> New Feature: DynamoDB Streams</h1><ul><li>Stream of updates to a table (Asynchronous)</li><li>Exact once ; Strictly ordered (per item)</li><li>24 hour life time</li><li>Cross-region replication is using this feature</li><li>Even multi-master scenario</li><li>Pattern :use lambda to read the stream and update</li><li>Pattern: use Kinesis to interact with stream and aggregate into multi-target</li><li>Use DynamoDB Stream to update Elastic Cache content</li><li>ListStreams / DescribeStream / GetShradIterator…</li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>Deepdive DynamoDB — Very good<br><a href="https://youtu.be/bCW3lhsJKfw" target="_blank" rel="noopener">https://youtu.be/bCW3lhsJKfw</a></p></blockquote><blockquote><p>Deepdive DynamoDB<br><a href="https://youtu.be/VuKu23oZp9Q" target="_blank" rel="noopener">https://youtu.be/VuKu23oZp9Q</a></p></blockquote><blockquote><p>How to choose the right key<br><a href="https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/</a></p></blockquote><h1 id="038mp4-039mp4-dynamodb-overview"><a class="markdownIt-Anchor" href="#038mp4-039mp4-dynamodb-overview"></a> 038.mp4 039.mp4 – DynamoDB Overview</h1><h2 id="no-sql-db"><a class="markdownIt-Anchor" href="#no-sql-db"></a> No SQL DB</h2><ul><li>Secondary indexes</li><li>Atomic Counters</li></ul><p>Terminology,</p><ul><li>Per table you can have max 5 global secondary index and max 5 local secondary index<ul><li>Global means accross partition</li></ul></li><li>Query and Scan<ul><li>Query is recommended , query has to use the Primary Key;</li><li>Scan will be used to scan the whole table (because no primary key no partition filter)</li><li>Both by default is eventually consistent. Can request a strong consistent query/scan</li></ul></li><li>Atomic Counters and Conditional Writes<ul><li><a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters" target="_blank" rel="noopener">https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters</a></li></ul></li></ul><h1 id="040mp4-dynamodb-handson"><a class="markdownIt-Anchor" href="#040mp4-dynamodb-handson"></a> 040.mp4 – DynamoDB Handson</h1><ul><li><p>Create a Table</p></li><li><p>Select Primary Key &amp; Sort Key ; Index Key &amp; Sort Key</p></li><li><p>Add item from Console</p></li><li><p>Prepare a JSON file to bulk upload data</p></li><li><p>using command line to batch upload data</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">aws dynamodb batch-write-item --request-items path/to/prepared.json</span><br></pre></td></tr></table></figure></li><li><p>using command line to query the table</p></li></ul><p>Supported Data types<br><a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.DataTypes.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.DataTypes.html</a></p><p>Table name rule<br><a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules" target="_blank" rel="noopener">https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules</a><br>charactor, number, underscore, dash, dot</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> DynamoDB </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - VPC</title>
      <link href="2018/04/11/markdown/AWS/AWS2018/12_VPC/"/>
      <url>2018/04/11/markdown/AWS/AWS2018/12_VPC/</url>
      
        <content type="html"><![CDATA[<h1 id="vpc-deepdive-2016"><a class="markdownIt-Anchor" href="#vpc-deepdive-2016"></a> VPC deepdive 2016</h1><blockquote><p><a href="https://youtu.be/Qep11X1r1QA" target="_blank" rel="noopener">https://youtu.be/Qep11X1r1QA</a><br>very deep session about BGP, VPN, Direct connect. Loads of technical details</p></blockquote><h2 id="difference-between-ipsec-vpn-and-directconnect"><a class="markdownIt-Anchor" href="#difference-between-ipsec-vpn-and-directconnect"></a> Difference between IPSec VPN and DirectConnect</h2><h2 id="hardware-vpn"><a class="markdownIt-Anchor" href="#hardware-vpn"></a> Hardware VPN</h2><ul><li>2 tunnels/VPC; each tunnel will connect to one AZ</li><li>0.05/hours/VPN (that includes 2 tunnels); ( EC2 medium is 0.1/hour)</li><li>support static VPN and dynamic VPN (BGP)</li></ul><h3 id="static-vpn-vs-dynamic-vpn"><a class="markdownIt-Anchor" href="#static-vpn-vs-dynamic-vpn"></a> Static VPN vs Dynamic VPN</h3><ul><li>Static VPN<ul><li>IP address is Static</li><li>each tunnel need 2 pairs of Security Association (inbound and outbound); that means 1 VPC connection needs 4 SA pairs</li></ul></li><li>Dynamic VPN<ul><li>BGP IP address is dynamically generated</li><li>Use ASN as registry to talk with each other (for AWS, 1 ASN/Region; for customer side, also need to configure ASN)</li><li>In PROD, you can setup 2 tunnels per VPC from customer site to connect to each VPC owned by you.</li></ul></li></ul><h3 id="common-maintain-faq-for-vpn"><a class="markdownIt-Anchor" href="#common-maintain-faq-for-vpn"></a> Common maintain FAQ for VPN</h3><ul><li>How to change pre-shared key?</li></ul><blockquote><p>Create a new VPN connection and delete current.  — (IP config might change)</p></blockquote><ul><li>How to change crypto?</li></ul><blockquote><p>change the config and the it will be updated during negotiation.</p></blockquote><ul><li>Migration VPN to another VPC</li></ul><blockquote><p>detach the VGW from VPC and re-attach to new VPC</p></blockquote><h3 id="vpn-billing"><a class="markdownIt-Anchor" href="#vpn-billing"></a> VPN Billing</h3><ul><li>0.05/hour/VPN</li><li>Data Transfer<ul><li>Flow in is free</li><li>VPC to VPC is not free</li></ul></li></ul><h2 id="direct-connect"><a class="markdownIt-Anchor" href="#direct-connect"></a> Direct Connect</h2><ul><li>For direct connect, your VPC can be private or public</li><li>Data in is free, data out via Direct Connect is cheaper compared to internet</li><li>One direct connection can be shared between multiple AWS accounts</li><li>Direct Connect only use BGP</li><li>Dedicated vs Hosted Direct Connection<ul><li>Dedicated: Connect from AWS Partner DataCenter to AWS router via fiber (1G or 10G)</li><li>Hosted: Connect to AWS via shared connection provided by AWS Partners (50-500MBps)<ul><li>Hosted only support single Virtual interface</li></ul></li></ul></li><li>Public VIF vs Private VIF<ul><li>Private VIF only connect you to VPC (not DNS and not S3)</li><li>Public VIF : can connect you to anything</li><li>Configure VIF specify: Public or Private; VLAN ; BGP Session</li></ul></li></ul><h3 id="maintain-faq-for-direct-connection"><a class="markdownIt-Anchor" href="#maintain-faq-for-direct-connection"></a> maintain FAQ for Direct Connection</h3><ul><li>How to move between accounts<ul><li>don’t delete, raise a support request</li></ul></li><li>How to move VIF<ul><li>Delete and create new (copy the old setting)</li></ul></li><li>Need public IP for VIF?<ul><li>support case</li></ul></li><li>Change bandwith<ul><li>Partner support case</li></ul></li></ul><h3 id="direct-connect-billing"><a class="markdownIt-Anchor" href="#direct-connect-billing"></a> Direct Connect Billing</h3><ul><li>More expesive per hour but cheaper for data transfer fee<ul><li>Data transfer fee is depending on who owns the VIF which support data transfer out<ul><li>For example, for public VIF, you access S3 owned by you, you pay; you access S3 owned by other people , they pay</li><li>For example, for private VIF, data transfer out via the VIF owned by you , you pay</li></ul></li></ul></li></ul><h3 id="ipv6-on-direct-connect"><a class="markdownIt-Anchor" href="#ipv6-on-direct-connect"></a> IPV6 on Direct Connect</h3><ul><li>Adding IPV6 to existing DX<ul><li>Select current DX , add peering , select IPV6</li><li>Then your DX page will show IPV4 and IPV6</li></ul></li><li>Your DX can support IPV4 or IPV6 or both</li></ul><h2 id="bgp-base-knowledge"><a class="markdownIt-Anchor" href="#bgp-base-knowledge"></a> BGP base knowledge</h2><ul><li>TCP protocol on port 179</li><li>ASN: Autonomous System Numbers : alway check when create<ul><li>If you use Public ASN to connect with aws via BGP, you must own that ASN ;</li><li>From 29min ???</li></ul></li><li>iBGP is used between peers inside same ASN, eBGP is used between peers belong to different ASN</li><li>AS_PATH : Measure of network distance</li><li>Local Preference: configure as preferred local connection</li></ul><h2 id="vpc-routing-preference"><a class="markdownIt-Anchor" href="#vpc-routing-preference"></a> VPC routing preference</h2><ul><li>local VPC routing wins</li><li>non local address will first check longest prefix (for example route config for 10.0.0.0/32 wins 10.0.0.0/16, because it’s more specific)</li><li>Static route config wins agaist dynamic</li><li>Dynamic routes, DX connection wins<ul><li>If both connection are DX, then, shorter AS_PATH wins</li><li>Same AS_PATH, then compare traffic</li></ul></li><li>Dynamic(BGP), non DX, then VPN<ul><li>Static VPN wins against BGP VPN</li><li>compare AS_PATH</li></ul></li></ul><h2 id="aws-vpn-cloudhub"><a class="markdownIt-Anchor" href="#aws-vpn-cloudhub"></a> AWS VPN CloudHub</h2><ul><li>AWS muti region connect to multiple cooperate DC via eBGP;</li><li>Advanced Scenario: use direct connect with one Cooperate DC, then use VPN CloudHub, and all datacenter can make use of the direct connect to speed up</li><li>With the multi to multi scenario ,the VPN hub is a pair of EC2 to act as soft VPN sits in transit VPC (use official cloudformation to automate)</li></ul><h2 id="vpn-over-vif"><a class="markdownIt-Anchor" href="#vpn-over-vif"></a> VPN over VIF</h2><ul><li>Direct Connection by default is not encrypted, what if I want to use DX but want the traffic to be encrypted?</li><li>Use VRF feature provided by router hardware to setup VPN tunnel based on DX</li></ul><h1 id="vpc-deep-dive"><a class="markdownIt-Anchor" href="#vpc-deep-dive"></a> VPC Deep Dive</h1><p>Backgroud of VPC: 2009 aws lanched VPC and then simplified to create a default VPC to every account.</p><h2 id="create-a-vpc"><a class="markdownIt-Anchor" href="#create-a-vpc"></a> create a VPC</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">aws ec2 describe-account-attribute</span><br><span class="line">aws ec2 create-vpc --cidr 10.0.0.1/16</span><br></pre></td></tr></table></figure><h2 id="command-to-create-an-ipsec-vpn"><a class="markdownIt-Anchor" href="#command-to-create-an-ipsec-vpn"></a> command to create an IPSec VPN</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">aws ec2 create-vpn-gateway --type ipsec.1</span><br><span class="line">aws ec2 attach-bpn-gateway --vpn vgw-f9da06e7 --vpc vpc-c15180a4</span><br><span class="line">aws ec2 create-customer-gateway --type ipsec.1 --public 54.64.1.2 --bgp 6500  </span><br><span class="line">aws ec2 create-vpn-connection --vpn vgw-f9da06e7 --cust cgw-f4d905ea --t ipsec.1</span><br></pre></td></tr></table></figure><h2 id="command-to-create-direct-connect"><a class="markdownIt-Anchor" href="#command-to-create-direct-connect"></a> command to create direct connect</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">aws directconnect create-connection --loc EqSE2 -b 1Gbps --con My_First</span><br><span class="line">aws directconnect create-private-virtual-interface --con dxcon-fgp13h2s --new VirutalInterfaceName=foo,vlan=10,asn=60,authkey=testing, amazonAddress=192.168.0.1/24,customerAddress=192.168.0.2/24,VirtualGatewayId=vgw-f9da06e7</span><br></pre></td></tr></table></figure><h2 id="combine-above-2-connectons"><a class="markdownIt-Anchor" href="#combine-above-2-connectons"></a> Combine above 2 connectons</h2><ul><li>We can setup 1 direct connect plus 1 vpn between aws vpc and on-promise network.</li></ul><h2 id="configure-vpc-routing-table"><a class="markdownIt-Anchor" href="#configure-vpc-routing-table"></a> Configure VPC routing table</h2><ul><li>Each VPC will have 1 default route table connected with all subnets.</li></ul><h2 id="further-step-create-internet-gateway-to-enable-vpcs-internet-connection"><a class="markdownIt-Anchor" href="#further-step-create-internet-gateway-to-enable-vpcs-internet-connection"></a> Further step: create internet gateway to enable VPC’s internet connection</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">aws ec2 create-internet-gateway</span><br><span class="line">aws ec2 attach-internet-gateway --internet igw-5a1ae13f --vpc vpc-c15180a74</span><br><span class="line">aws ec2 delete-route -ro rtb-ef36e58a --dest 0.0.0.0/0</span><br><span class="line">aws ec2 create-route -ro rtb-ef36e58a --dest 0.0.0.0/0 --gateway-id igw-5a1ae13f</span><br><span class="line">aws ec2 create-route -ro rtb-ef36e58a --dest 192.168.0.0/16 --gateway-id vgw-5a1ae13f</span><br></pre></td></tr></table></figure><h2 id="automatic-route-propagation-from-vgw"><a class="markdownIt-Anchor" href="#automatic-route-propagation-from-vgw"></a> Automatic Route Propagation from VGW</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">aws ec2 delete-route -ro rtb-ef36e58a --dest 192.168.0.0/16</span><br><span class="line">aws ec2 enable-vgw-route-propapation -ro rtb-ef36e58a --gateway-id vgw-5a1ae13f</span><br></pre></td></tr></table></figure><h2 id="isolate-some-of-the-subnets-connection-inside-the-vpc"><a class="markdownIt-Anchor" href="#isolate-some-of-the-subnets-connection-inside-the-vpc"></a> Isolate some of the subnet’s connection inside the VPC</h2><ul><li>Create separate route table for the subnet</li></ul><h2 id="software-firewall-to-internet-nat"><a class="markdownIt-Anchor" href="#software-firewall-to-internet-nat"></a> software firewall to internet (NAT)</h2><figure class="highlight plain"><figcaption><span>cli</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"># by-default it won&apos;t work, we need to change the eni of the NAT server. Disable the source and dest check</span><br><span class="line">aws ec2 modify-network-interface-attribute --net eni-f832afcc --no-source-dest-check</span><br><span class="line">aws ec2 create-route --ro rtb-ef36e58a --dest 0.0.0.0/0 --instance-id i-f832afcc</span><br><span class="line">aws ec2 create-route --ro rtb-ef36e31c --dest 0.0.0.0/0 --gateway-id igw-5a1ae13f</span><br></pre></td></tr></table></figure><h2 id="vpc-peering"><a class="markdownIt-Anchor" href="#vpc-peering"></a> VPC peering</h2><ul><li>VPC peering support across acount, across region</li><li>VPC peering is a service ; no Single point of failure</li><li>Use case :</li><li>shared service running inside VCP and peering with other VPC.</li><li>Separate backend DEV/TEST/PROD; with VPC peering, all env have same ip , the service pointing different VPC by configure the iprouting table to pick the correct VPC, and each DEV/TEST/PROD VPC will have exact same IP setting.</li><li>2 to-be-peered VPCs can’t have IP address overlap, but 1 VPC can peer with multiple VPCs which has ipaddress overlap.</li><li>Security Groups can’t be refered across VPC</li><li>VPC peering can use similiar way as a NAT server. (routing internet connect via another VPC )<ul><li>so one VPC don’t need to have internet access, it connect to internet through another VPC’s igw</li></ul></li></ul><h1 id="remote-connection-best-practice"><a class="markdownIt-Anchor" href="#remote-connection-best-practice"></a> Remote connection Best Practice</h1><ul><li>For each customer gateway create 2 VPN tunnel to 2 availability zones that VPC has .</li><li>use 2 customer gateway on-promise for failover. Then customer gateway has 2 VPN tunnel link to different AZ.</li><li>Use 2 direct connect, each direct connect will link to 1 AZ, then add another IPSec VPN (customer gateway) and link 2 different AZ.</li></ul><h1 id="vpc-performance"><a class="markdownIt-Anchor" href="#vpc-performance"></a> VPC Performance</h1><p><strong>packats per second</strong> : important capability for instances to get high VPC performance</p><ul><li>EC2 has driver to better use physical network and bypass the virtualization layer</li></ul><h1 id="reference-customer-use-case"><a class="markdownIt-Anchor" href="#reference-customer-use-case"></a> Reference Customer Use Case</h1><h2 id="live-video"><a class="markdownIt-Anchor" href="#live-video"></a> live video</h2><ul><li>A backpack containing multi routers to provide multi connections to internet to makesure the live vidio being sent.</li><li>inside AWS, to transcoding and published</li></ul><h2 id="tradeair"><a class="markdownIt-Anchor" href="#tradeair"></a> TradeAir</h2><p>Small banking solution on cloud.</p><ul><li>Use direct connect with on-promise</li></ul><h1 id="vpc-migration"><a class="markdownIt-Anchor" href="#vpc-migration"></a> VPC migration</h1><p>ClassicLink : EC2-Classic instances communicate with EC2-VPC instances<br>Helps to migrate  EC2-Classic platform into VPC network.</p><ul><li>move aws managed services first</li><li>Make ELB ready to route traffic to both classic and VPC networks</li><li>start new VPC EC2 instances and route the traffic in</li><li>turn off the old classic EC2 instances gracefully</li></ul><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><blockquote></blockquote><p><a href="https://youtu.be/i6Zf9lwXRcY" target="_blank" rel="noopener">https://youtu.be/i6Zf9lwXRcY</a></p><blockquote><p>VPC deepdive 2015<br><a href="https://youtu.be/B8vnhRJDujw" target="_blank" rel="noopener">https://youtu.be/B8vnhRJDujw</a></p></blockquote><h1 id="hosted-virtual-interface-vs-hosted-connection-when-using-direct-connection"><a class="markdownIt-Anchor" href="#hosted-virtual-interface-vs-hosted-connection-when-using-direct-connection"></a> Hosted Virtual Interface vs Hosted Connection — when using direct connection</h1><ul><li>VIF is direct connection (1Gpbs) – you can share with multiple account</li><li>Hosted Connection is direct connection but being splitted by AWS partner and garanteed speed (for example 50Mbps per connection)</li></ul><h2 id="reference-2"><a class="markdownIt-Anchor" href="#reference-2"></a> Reference</h2><blockquote><p><a href="https://youtu.be/r7zamTFGxcM" target="_blank" rel="noopener">https://youtu.be/r7zamTFGxcM</a></p></blockquote><h1 id="vpc-enhancements"><a class="markdownIt-Anchor" href="#vpc-enhancements"></a> VPC enhancements</h1><h2 id="elastic-network-adapter"><a class="markdownIt-Anchor" href="#elastic-network-adapter"></a> Elastic Network Adapter</h2><ul><li>PCI device to support variable speed</li><li>If you are not using AWS AMI , you need to install the driver to get best performance</li><li>HVM instance can have access to AWS 10Gbps pysical network card</li><li>Only high end EC2 support ENA ; you can manually build and install the driver</li><li>There’s only enable (there’s no disable after enabled)</li></ul><h1 id="reference-3"><a class="markdownIt-Anchor" href="#reference-3"></a> Reference</h1><blockquote><p><a href="https://youtu.be/CBmSl3O-AhI" target="_blank" rel="noopener">https://youtu.be/CBmSl3O-AhI</a></p></blockquote><h1 id="044mp4-045mp4-virtual-private-cloud"><a class="markdownIt-Anchor" href="#044mp4-045mp4-virtual-private-cloud"></a> 044.mp4 045.mp4 – virtual private cloud</h1><ul><li>Inside a certain region and spanning multiple availble zones</li><li>Class A/B/C private network ranges; subnet mask</li><li>Important: aws reserve first 3 and last 1 ip addresses (for example, 0,1,2,3 and 255)<ul><li>255 is the broadcasting address</li><li>one address is preserved for Internet Gateway</li><li>TODO: how this 4 address is preserved???</li></ul></li><li>default VPC: 172.31.0.0/16, which means (2^(32-16) - 4) availble ips</li><li>1 VPC can have multiple subnet;<ul><li>min size is /28, with means 2^(32-28)-1 (for bradcassing) -1( 0 ?? IGW??) =14 available addresses</li></ul></li></ul><h2 id="how-to-connect-to-a-vpc"><a class="markdownIt-Anchor" href="#how-to-connect-to-a-vpc"></a> how to connect to a VPC</h2><ul><li>via internet gateway or via virtual private gateway and via VPN</li><li>Route Tables: if any instances inside a VPC need access to internet, then Route Table is needed<ul><li>Main Route Table and Custom route table</li></ul></li><li>NAT Gateway: Network Address Translation (NAT)<ul><li>Internet Gateway is attached to VPC. NAT Gateway sits in a Public Subnet.</li><li>NAT gateway is not necessary. Instance in Public Subnet can directly access internet via IWG, only only when we want to configure to let instance sits in Private Subnet to have limited access to internet (only connection initiated from private subnet is allowed and must go through NAT), we can configure NAT to control this.</li><li>For any instance sits in a private subnet, they access internet via 0.0.0.0/0 pointing to NAT sits in public subnet; then from NAT instance, they can visit internet.</li><li><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html</a></li></ul></li></ul><h2 id="vpc-security"><a class="markdownIt-Anchor" href="#vpc-security"></a> VPC Security</h2><ul><li>Security Groups : Instance Level</li><li>ACL (Access Controle Lists): Subnet level</li><li>Flow Logs (Capture as CloudWatch logs)</li></ul><h1 id="046mp4-vpc-handon-demo"><a class="markdownIt-Anchor" href="#046mp4-vpc-handon-demo"></a> 046.mp4 – VPC handon demo</h1><ul><li>VPC wizard to create VPC</li><li>Select one of the option to use “VPC with public and private subnet”</li><li>Select the NAT Gateway (rates apply) – use the service or start an EC2 instance acting as NAT (Here t2.micro is selected)</li><li>Service Endpoints : AWS S3 not sits inside VPC, when EC2 in VPC need access S3, instead of define role for each EC2, we can define a Service Endpoint to allow EC2 access the S3 without assign a role.<ul><li>select service – only S3 Available</li><li>Select which subnet</li><li>select access policy</li><li>can create multiple endpoints for a VPC</li></ul></li><li>Check the ACL being created and binded with the Subnets<ul><li>check and edit the inbound/outbound rules</li><li>ACL rule is stateless compared with security group (Security group don’t define in/outbound separatedly)</li><li>ACL rule has both allow and deny rules compared to security group</li></ul></li><li>Check subnets — check the route being configured for public and private subnet</li><li>To have internet access to EC2 : have route to igw,  ACL allowed, security group allowed</li></ul><p>CIDR block ; when a subnet designed is too small<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Subnets.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Subnets.html</a><br><a href="https://aws.amazon.com/premiumsupport/knowledge-center/vpc-ip-address-range/" target="_blank" rel="noopener">https://aws.amazon.com/premiumsupport/knowledge-center/vpc-ip-address-range/</a></p><p>Basic : ClassABC private network address<br><a href="https://en.wikipedia.org/wiki/Private_network" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/Private_network</a></p><p>VPC Peering<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html</a></p><p>VPN-Only Subnet<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario3.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario3.html</a></p><p>VPC Security Group<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html</a></p><p>Two types of security groups: for EC2-classic / for EC2-VPC <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups</a><br>Difference between 2 types of Security Groups<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html#VPC_Security_Group_Differences" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html#VPC_Security_Group_Differences</a></p><p>VPC Primary Private IP Address ; Secondary Private IP address<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html</a></p><ul><li>Primary Private IP Address is the main IP address binding with eth0</li><li>Secondary Private IP Addreess is additional manually assigned IP address. It can be reassigned</li><li>Both Primary and Secondary Private IP addresses, once assigned will be attached to instance until it’s terminted</li></ul><p>ENI: Elastic Network Interface<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html</a></p><ul><li>虚拟网卡。 每个instance默认有eth0</li><li>根据instance类型不同，允许attach不同数目的虚拟网卡</li><li>ENI是instance能够有secondary private IP address的基础。</li><li>常用场景是一个instance坐在两个subnet上，一个网卡（例如eth0）连subnet A，一个连subnet B。<ul><li>Subnet A 是public subnet，通过安全控制使得该instance可以提供http网络服务</li><li>Subnet B 是private subnet，通过安全控制使得该instance只能连VPN Gateway，通过该gateway允许指定内网ip地址提供ssh连接进行管理。</li></ul></li></ul><p>BGP-capable VPN device (Border Gateway Protocol)<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html</a></p><p>Tenancy property for EC2 instance has 2 values to choose from : default ; dedicated<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated-instance.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated-instance.html</a></p><p>Activate VPC Peering connection<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/create-vpc-peering-connection.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/create-vpc-peering-connection.html</a></p><p>VPC limitations,<br><a href="https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html</a></p><p>VPN Pricing<br><a href="https://aws.amazon.com/vpc/pricing/" target="_blank" rel="noopener">https://aws.amazon.com/vpc/pricing/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> VPC </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Relational Database Service</title>
      <link href="2018/04/10/markdown/AWS/AWS2018/09_RelationalDBService/"/>
      <url>2018/04/10/markdown/AWS/AWS2018/09_RelationalDBService/</url>
      
        <content type="html"><![CDATA[<h1 id="rds"><a class="markdownIt-Anchor" href="#rds"></a> RDS</h1><h2 id="max-storage-limites"><a class="markdownIt-Anchor" href="#max-storage-limites"></a> Max Storage limites</h2><ul><li>MS SQL Server <strong>4TB</strong></li><li>My SQL, Oracle, PostgreSQL, MariaDB <strong>6TB</strong></li><li>Aurora  <strong>64TB</strong></li></ul><h2 id="supported-relational-db-types"><a class="markdownIt-Anchor" href="#supported-relational-db-types"></a> Supported Relational DB types</h2><ul><li>Support 6 Relational DB types, MS sql;mysql;Oracle; AWS Aurora; PostgreSQL; MariaDB</li><li>Backup - to S3 (can be encrypted for db or snapshot at rest)</li><li>Failover - Multi-AZ ; when master fails , standby is promoted , then CName is updated to poiting to standby, then new instance is created to replace the master</li><li>Read Replica (don’t support ms sql and oracle); one DB can have multiple read replicas<ul><li>If you have multiple read replica, then routing from single url using Route53 or customed HAProxy , <strong>not support AWS ELB</strong></li></ul></li></ul><h1 id="security"><a class="markdownIt-Anchor" href="#security"></a> Security</h1><ul><li>Network firewall control<br>Like EC2 security group, RDS has its own security group settings</li><li>Access Control<br>IAM : IAM can’t control who can log in database (database user and groups); IAM only controls who can have what level of access to the RDS service.</li><li>Compliance and Transport SSL</li><li>At Rest Encryption: using KMS(key management service) &amp; Envolope Encryption<ul><li>Limit risk of compromise key</li><li>Centralized access and audit of key activity</li><li>just click “enable encryption”</li></ul></li><li>At Rest Encryption Limitations<ul><li>only available when creation a new database; and once enabled , can’t be removed.</li><li>Unencrpyted snapshot can be changed to encrypted snapshot</li><li>Encrypted db across region (aws is working on that)</li></ul></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/09_RDS_EncryptDataAtRest.png?raw=true" alt="config to enable encryption"></p><h1 id="metrics-and-monitoring"><a class="markdownIt-Anchor" href="#metrics-and-monitoring"></a> Metrics and monitoring</h1><ul><li>1 min interval by default</li><li>Enhanced Monitoring – more detailed , minimum 1 second</li><li>AWS <strong>Performance Insights for RDS</strong> : designed for RDS, with top SQL and help with identify bottlenecks</li></ul><h1 id="high-availability"><a class="markdownIt-Anchor" href="#high-availability"></a> High Availability</h1><ul><li><p>take 30 sec to a few min</p></li><li><p>Done automatically by DNS CName binding from Primary to Secondary</p></li><li><p>Launch multiple read replica into different regions</p></li><li><p>Aurora is different with traditional DB</p><ul><li>have concept similiar like shards</li><li><strong>Read Replica Endpoint</strong>: single point to the back end read replica clusters</li></ul></li></ul><h1 id="scaling-rdb"><a class="markdownIt-Anchor" href="#scaling-rdb"></a> Scaling RDB</h1><ul><li>Scale your Master Database<ul><li>by change the instance running the instance, optional “Apply immediately” otherwise wait for maintainance window</li><li>around 6 min to get it done</li><li>for database with standby in multiAZ , it will upgrade standby first, flag standby as master and master as standby; update master (around 20min)</li><li><strong>automatic scale</strong>: depending on usage, schedule scale up and down</li></ul></li></ul><p>Option1, aws CLI + CRON</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">aws rds modify-db-instance --db-instance-identifier sg-cli-test --db-instance-class db.m4.large --apply-immediately</span><br><span class="line"><span class="meta">#</span><span class="bash"> Scale down at 8PM Friday</span></span><br><span class="line">0 20 * * 5 ~/scale_down_rds.sh</span><br><span class="line"><span class="meta">#</span><span class="bash"> Scale up at 4AM on Monday</span></span><br><span class="line">0 4 * * 1 ~/scale_up_rds.sh</span><br></pre></td></tr></table></figure><p>Option2, using aws lambda (Python + boto3 lib)</p><p>Option3, Metrics + SNS/SQS + Lambda -&gt; scale up / down</p><ul><li>scale number of replicas</li><li>Storage Scaling</li><li>Scale the IOPS provisioned</li></ul><h1 id="backup-snapshots"><a class="markdownIt-Anchor" href="#backup-snapshots"></a> Backup &amp; Snapshots</h1><h2 id="backup"><a class="markdownIt-Anchor" href="#backup"></a> Backup</h2><p>Backup is automatic scheduled and configurable</p><ul><li>Aurora<ul><li>Continous , no performance impact</li></ul></li><li>Other RDB<ul><li>Default retention 1 day , configurable max = 35 days</li><li>Default daily, can select the backup window</li><li>for Multi-AZ , will use standby</li><li>Scheduled the backup at 7PM , then AWS try to get a full copy of data starting from 7PM + every 5 min extra log until the copy is fully finished at 7:30PM. 7:30PM will be the “Latest Restorable Time” and we can go back to any time from 7PM to 7:30PM when we restore a database from this backup.</li></ul></li></ul><h2 id="snapshot"><a class="markdownIt-Anchor" href="#snapshot"></a> Snapshot</h2><p>Snapshot is manually triggered compared to backup.</p><h2 id="migration"><a class="markdownIt-Anchor" href="#migration"></a> Migration</h2><p>MySQL Dump --&gt; S3 (multi part upload; snowball )–&gt; restore Aurora from S3</p><p><strong>AWS Database Migration Service</strong>:<br>database heterogeneous migration</p><h1 id="037mp4-aws-rds-hands-on"><a class="markdownIt-Anchor" href="#037mp4-aws-rds-hands-on"></a> 037.mp4 – AWS RDS Hands On</h1><ul><li>Create the mysql service — (PROD must use multi-AZ , but dev can be single instance)</li><li>Change to not use “Generic SSD” which not suitable for DB because of high IOPS</li><li>Enable Multi-AZ</li><li>Publically accesable – disable; control security using VPC</li></ul><p>RDS Security groups<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html</a></p><p>DB Parameter group<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html</a><br>数据库的参数组。</p><p>DB Option Group ; lifecycle for the option group config (e.g a restored db )<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithOptionGroups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithOptionGroups.html</a><br>数据库的feature option组。</p><p>comments:<br>Option groups allows the use of available features within a database. So if you spun up a SQL RDS instance, you could configure TDE, Native SQL backup/restore, or mirroring. Parameter groups is how the database is configured. Min/max settings, etc.</p><h1 id="知识点"><a class="markdownIt-Anchor" href="#知识点"></a> 知识点</h1><ul><li><p>关于RDS take snapshot时候的I/O suspension</p><ul><li>如果是single AZ，数据库肯定有I/O suspension</li><li>如果是multi AZ，而且数据库设置了backup， 那么snapshot会在backup数据库上进行，这时候就不会有io suspension</li></ul></li><li><p>MySQL engine 的选择</p></li><li><p>InnoDB storage engine ：  Point-In-Time restore and snapshot restore require a recoverable storage engine and are supported for the InnoDB storage engine only.</p></li><li><p>MyISAM storage engine does not support reliable recovery and can result in lost or corrupt data when MySQL is restarted after a recovery, preventing Point-In-Time restore or snapshot restore from working as intended. However, if you still choose to use MyISAM with Amazon RDS, snapshots can be helpful under some conditions.</p></li><li><p>MyISAM performs better than InnoDB if you require intense, full-text search capability.</p></li><li><p>The Federated Storage Engine is currently not supported by Amazon RDS for MySQL.</p></li><li><p>3306是MySQL的默认端口；3389是RemoteDesktop的默认端口</p></li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>035.mp4 036.mp4 – AWS RDS Services</p></blockquote><blockquote><p>Deep Dive<br><a href="https://youtu.be/pPLPzPYY5uU" target="_blank" rel="noopener">https://youtu.be/pPLPzPYY5uU</a></p></blockquote><p>Fee<br><a href="http://calculator.s3.amazonaws.com/index.html#s=RDS" target="_blank" rel="noopener">http://calculator.s3.amazonaws.com/index.html#s=RDS</a></p><p>Best practise<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html</a></p><p>Restore the RDB<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html</a></p><p>RDB maintainance sequence<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.Maintenance.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.Maintenance.html</a></p><p>Steps to db deletion<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_DeleteInstance.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_DeleteInstance.html</a></p><p>Monitoring types<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html</a></p><p>Cloudwatch and CloudTrail to help monitor the db Performance<br><a href="https://docs.aws.amazon.com/awscloudtrail/latest/userguide/send-cloudtrail-events-to-cloudwatch-logs.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/awscloudtrail/latest/userguide/send-cloudtrail-events-to-cloudwatch-logs.html</a></p><p>Automatic Backup; backup retention periods<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html</a></p><p>PIOPS and IOPS and page size ; ratio of the requested IOPS rate to the amount of storage allocated<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html</a></p><p>RDS reserved instances<br><a href="https://aws.amazon.com/blogs/aws/reserved-instance-options-for-amazon-ec2/" target="_blank" rel="noopener">https://aws.amazon.com/blogs/aws/reserved-instance-options-for-amazon-ec2/</a></p><p>DB Subnet group<br><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html#USER_VPC.Subnets" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html#USER_VPC.Subnets</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS RDS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CloudFront</title>
      <link href="2018/04/10/markdown/AWS/AWS2018/08_CloudFront/"/>
      <url>2018/04/10/markdown/AWS/AWS2018/08_CloudFront/</url>
      
        <content type="html"><![CDATA[<h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><p>Canaries: when deploy limite the new version into a contained env until fully tested then roll out.</p><p>CloudFront = Content Delivery Network by AWS</p><p><strong>CloudFrontPop</strong>: Point Of Presence, Edge Location</p><ul><li>Located in DataCenter (Major metropolitan) with direct connection with multi ISPs</li><li>Terminating Viewer connections</li><li>Request routing to CloudFront is mainly done by <strong>DNS Layer</strong></li></ul><p>CloudFront deployed on Edge locations (number of Edge locations&gt;Available zones&gt; Regions)</p><ul><li>Source can be S3 , HTTP Server on AWS or outside AWS</li><li>CloudFront only cache GET and HEAD requests, for POST,PUT,DELETE cloudfront will only proxy</li><li>Different combinations to cache for static / dynamic websites<ul><li>For static, we can select only cache documents , exclude HTML/css/code; or set TTL and cache all</li><li>For dynamic, we can select only cache static content , exclude php/code ; or set TTL and cache all</li></ul></li></ul><h1 id="how-it-works"><a class="markdownIt-Anchor" href="#how-it-works"></a> How it works</h1><p>Cache Key : generated using request URL (remove query string, protocol  and add encoding)</p><p>Specify ExpireTime: sent from original header<br>Specify Max Age: for example max-age=300 means 5 min</p><h2 id="cloudfront-configurations"><a class="markdownIt-Anchor" href="#cloudfront-configurations"></a> CloudFront Configurations</h2><ul><li>Config <strong>Cache Behaviors</strong></li></ul><h1 id="best-practise"><a class="markdownIt-Anchor" href="#best-practise"></a> Best Practise</h1><h2 id="try-to-use-ecs-enabled-resolver"><a class="markdownIt-Anchor" href="#try-to-use-ecs-enabled-resolver"></a> Try to use ECS-Enabled Resolver</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/08_CloudFront_Issue.png?raw=true" alt="CloudFront Common Issue"></p><p>Issue , when View send request to ISP, ISP will forward the request to AWS and lost the original viewer’s location information, and aws Route53 will only return the Edge Server which has smallest latency with the ISP instead of the viewer.</p><p>Resolution:</p><ul><li>Use local resolver</li><li>Use Resolver that Support ECS (EDNS0 Client Subnet)—&gt;DNS Query include requester’s network info</li><li>Some resolver like google support this new standard.</li></ul><h2 id="cache-the-error-page"><a class="markdownIt-Anchor" href="#cache-the-error-page"></a> cache the error page</h2><p>When source returns error page, that static error page can also be cached.</p><h2 id="cache-dynamic-content"><a class="markdownIt-Anchor" href="#cache-dynamic-content"></a> Cache Dynamic content</h2><p>Set TTL =0 (???)</p><h2 id="version-your-contents"><a class="markdownIt-Anchor" href="#version-your-contents"></a> Version your contents</h2><p>If you don’t, then the new version might not be relfected. Version your contents means different version of content have different url</p><h2 id="minimize-forwardheaders-values"><a class="markdownIt-Anchor" href="#minimize-forwardheaders-values"></a> minimize ForwardHeaders values</h2><p>Forwarded headers will be used as part of Cache Key</p><h2 id="debug-use-the-cloudfront-logs"><a class="markdownIt-Anchor" href="#debug-use-the-cloudfront-logs"></a> Debug --&gt; Use the CloudFront Logs</h2><p>Log your request ID</p><h2 id="use-pre-configured-waf-rules-with-waf-to-protect-cloudfront"><a class="markdownIt-Anchor" href="#use-pre-configured-waf-rules-with-waf-to-protect-cloudfront"></a> Use pre-configured WAF Rules with WAF to protect CloudFront</h2><h2 id="protect-private-content"><a class="markdownIt-Anchor" href="#protect-private-content"></a> Protect Private Content</h2><p>How to stop user directly access your origin<br>Option 1: Prevent direct connect to origin (OAI, Origin Access Identify)<br>Option 2: Origin only accept request from Caching layer (limit source IP address)</p><p>Use signed URL to restrict access to certain file (no correct url no access)<br>Use signed cookie ; restrict access to multiple file (module)</p><h1 id="monitoring"><a class="markdownIt-Anchor" href="#monitoring"></a> Monitoring</h1><p>Metrics: Server-side Metrics; Canaries;  3rd party http tests</p><h1 id="design-pattern-for-availability"><a class="markdownIt-Anchor" href="#design-pattern-for-availability"></a> Design Pattern for Availability (???)</h1><p><strong>Food Tasting</strong><br><strong>Flash Crowds</strong><br><strong>Defence In Depth</strong><br><strong>Time Bomb Jitter Protection</strong></p><h1 id="rum-real-user-monitoring"><a class="markdownIt-Anchor" href="#rum-real-user-monitoring"></a> RUM (Real User Monitoring)</h1><h2 id="synthetic-monitoring"><a class="markdownIt-Anchor" href="#synthetic-monitoring"></a> Synthetic Monitoring</h2><ul><li>Synthetic Monitoring<ul><li>Simulate client to test the performance of the application ; can use as baseline of the application</li></ul></li><li>Consistent Signal of Service Health</li></ul><h2 id="rum"><a class="markdownIt-Anchor" href="#rum"></a> RUM</h2><ul><li>Script Injected in webpage to send data back and aggregate the data.</li><li>Analysis RUM<ul><li>Use CloudFront</li><li>Use Multi Region</li><li>Use HTTP/2 (support multi thread to serve multi objects at the same time)</li></ul></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>033.mp4 034.mp4 – cloudfront overview</p></blockquote><p>Design pattern for HA, lessons from AWS CloudFront (???)</p><blockquote><p><a href="https://youtu.be/n8qQGLJeUYA" target="_blank" rel="noopener">https://youtu.be/n8qQGLJeUYA</a></p></blockquote><p>CloudFront Best practise</p><blockquote><p><a href="https://youtu.be/fgbJJ412qRE" target="_blank" rel="noopener">https://youtu.be/fgbJJ412qRE</a></p></blockquote><h1 id="using-amazon-cloudfront-for-your-websites-apps"><a class="markdownIt-Anchor" href="#using-amazon-cloudfront-for-your-websites-apps"></a> Using Amazon CloudFront For Your Websites &amp; Apps</h1><h2 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h2><blockquote><p><a href="https://youtu.be/gUAuhdtHacI" target="_blank" rel="noopener">https://youtu.be/gUAuhdtHacI</a></p></blockquote><ul><li>Tangible things to take away</li></ul><h2 id="best-practise-to-set-up-your-origin"><a class="markdownIt-Anchor" href="#best-practise-to-set-up-your-origin"></a> Best Practise to set up your origin</h2><ul><li>Use Rout53 Health check and DNS failover for your origin<ul><li>Enable latency based routing to multi edge locations<ul><li>Sample scenario: 1 single region app act as origin for 4 cloud front edge in 4 regions<ul><li>Improved: multi origin in multi regions, one origin down, the cloud front will fetch origin from avaiable origin</li></ul></li></ul></li><li>If you enable, Route53 will do health check automatically agianst your origin and if the backend is not healthy, the request is automatically detoured to other healthy origins</li></ul></li><li>Configure multi origins<ul><li>Setup multiple origins — dynamic contents to application and static to S3</li></ul></li><li>Secure your origin<ul><li>If content is hosted via S3, by limit origin via OAI (Object Access Identity) only from could front, it will improve performance and protect the S3</li><li>If content is hosted via backend and with cloudfront cache, limit the origin at back end via white list cloudfront’s IP address.<ul><li>How to automatically update these addresses : subscribe AWS official SNS, when ip changes use lambda to parse the notification and update security groups.</li></ul></li><li>Prevent backend overload</li></ul></li><li>Log request IDs<ul><li>Generate request id and log to help debugging (???)</li></ul></li><li>Set origin response headers<ul><li>Strict-Transport-Security:  max-age=15552000<ul><li>tells browser to only send request via https, even address is http, the browser should change it to https.</li><li>helps with downgrade attacks</li></ul></li><li>X-Frame-Options: SAMEORIGIN<ul><li>used to tell if the response can be put inside an iframe. If yes, then the content can be hi-jacked and embeded into other website’s iframe webpage.</li></ul></li><li>X-XSS-Protection: 1; mode=block Options<ul><li>means to enable the XSS protection</li></ul></li><li>Cache-Control: max-age=300;public</li></ul></li></ul><p>Demo: create security group to allow all cloudfront ip address, add the security group to VPC subnet allow list where the real origin sits inside the subnet; use lambda to update the security group based on AWS published clouddfront list (SNS)</p><h2 id="gaining-visibility-into-your-distribution"><a class="markdownIt-Anchor" href="#gaining-visibility-into-your-distribution"></a> Gaining visibility into your distribution</h2><p>Scenario: monitor cache hit/miss directly using cloudfront’s report view.<br>Scenario: look at cache statistics: “Percentage of GET Requests that Didn’t Finish Downloading” is surging up.</p><ul><li>Check the Popular Objects, ranking by “Incomplete Download”, possible reason is that file is too large and not segmented.<br>Scenario: Check the “Viewers”  to check customer by location (country)<br>Scenario: Check the “Viewers” to check the customer by device</li></ul><p>Scenario: how to identify bots? Create log to cloudwatch, filter log by bots and create alarms<br>Scenario: how to disable SSLv2 gracefully ? Create log to cloudwatch, filter log to check how many user are using SSL v3, then decide wether to turn SSL v3 on.</p><h2 id="how-to-improve-cacheability"><a class="markdownIt-Anchor" href="#how-to-improve-cacheability"></a> how to improve cacheability</h2><h3 id="versionning-website-assets"><a class="markdownIt-Anchor" href="#versionning-website-assets"></a> Versionning website assets</h3><ul><li><p>Two ways to version cached assets,</p><ul><li>Use version number <a href="http://www.sample.com/v1/css/style.css;" target="_blank" rel="noopener">www.sample.com/v1/css/style.css;</a> <a href="http://www.sample.com/v2/css/style.css" target="_blank" rel="noopener">www.sample.com/v2/css/style.css</a></li><li>Use md5sum <a href="http://www.sample.com/css/style.css" target="_blank" rel="noopener">www.sample.com/css/style.css</a>?<md5sum>   (How it works?)</md5sum></li></ul></li><li><p>Benefits</p><ul><li>monitor the error percentage when release new version, easy to roll back once something is wrong.</li></ul></li><li><p>Common CloudFront Cache Strategies</p><ul><li>js, css, image, set to 1 year (as long as it can be)</li><li>index.html (no cache, max age = max session age) – make sure unique session id is generated for each unique user</li><li>live streaming (max-age =2 , make sure user will follow the streaming )</li></ul></li><li><p>Shared assets from multi properties</p><ul><li>Single S3 as origin for different domain, one .org and one .com</li></ul></li><li><p>Forwarded values<br>When request hits cloudfront, still some dynamic content needs to visit the backend origin , then forwarded values will define with header will be forwarded, (like jsessionid)</p></li><li><p>Invalidate the cache from cloudfront</p><ul><li>it will only invalidate cache in cloudfront ,won’t touch cache in client browser side</li></ul></li></ul><h2 id="how-to-test-your-configuration"><a class="markdownIt-Anchor" href="#how-to-test-your-configuration"></a> How to test your configuration</h2><ul><li>Test In Development Mode<ul><li>Set TTL to 0 (then request will go through cloudfront but without caching)</li><li>Use WAF to white list only office IP</li><li>Chrome dev mode to check the unique headers added by cloudfront</li></ul></li><li>performance test<ul><li>Backbone Testing , network provider to test from datacenter</li><li>Last Mile Testing , network provider to help test from end user</li><li>Real User Testing , use real web site user performance by injecting some scripts</li></ul></li><li>Load testing<ul><li>traditional load testing is from one client which is hard to simulate DNS load balancing and real user env.</li><li>Better use distributed client with different ip in different region</li></ul></li><li>SSL Lab<ul><li>Helps you verify your ssl config</li><li><a href="https://www.ssllabs.com/" target="_blank" rel="noopener">https://www.ssllabs.com/</a></li></ul></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS CloudFront </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - CLI</title>
      <link href="2018/04/10/markdown/AWS/AWS2018/07_CLI/"/>
      <url>2018/04/10/markdown/AWS/AWS2018/07_CLI/</url>
      
        <content type="html"><![CDATA[<h1 id="030mp4-cli"><a class="markdownIt-Anchor" href="#030mp4-cli"></a> 030.mp4 – CLI</h1><p>2 ways of interact with AWS via CLI</p><ul><li>install locally, access AWS services via https</li><li>install on EC2, access AWS services via SSH</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">aws --version</span><br><span class="line">aws s3 mb s3://newbucket</span><br><span class="line">aws s3 ls</span><br><span class="line">aws s3 rb s3://newbucket</span><br><span class="line">aws ec2 help</span><br><span class="line">aws ec2 describe-instances</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">aws iam create-user --user-name newuser</span><br></pre></td></tr></table></figure><p>TODO,<br>setup on mac and handson aws</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> CLI </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - S3</title>
      <link href="2018/04/09/markdown/AWS/AWS2018/05_SimpleStorageService_S3/"/>
      <url>2018/04/09/markdown/AWS/AWS2018/05_SimpleStorageService_S3/</url>
      
        <content type="html"><![CDATA[<h1 id="s3-overview"><a class="markdownIt-Anchor" href="#s3-overview"></a> S3 Overview</h1><p>S3 is a webstore , not a file system !!</p><p>S3 – writing concurrency</p><ul><li>“Eventually Consistency”: make sure concurrent write will eventually get synced</li><li>“Read after Write Consistency”: make sure read access do not need to wait until write consistency is archived.</li></ul><h2 id="s3-event-notification"><a class="markdownIt-Anchor" href="#s3-event-notification"></a> S3 Event Notification</h2><ul><li>Configure when event is triggerred</li><li>Configure filter : File Prefix (Path and name) , Suffix</li><li>Notification can integrate with Lambda, SQS , SNS</li></ul><h2 id="lifecycle-management"><a class="markdownIt-Anchor" href="#lifecycle-management"></a> LifeCycle Management</h2><p>S3 --&gt; S3 IA --&gt; Glacier</p><ul><li>S3 IA has same durability 11 9’s with S3 (Availability is lower as 2 9’s)</li><li>You can move object to IA by life cycle policy or direct upload with parameter to specify to save in IA</li></ul><h2 id="cross-region-replication"><a class="markdownIt-Anchor" href="#cross-region-replication"></a> Cross region replication</h2><ul><li><p>ACL (Access Control List ) policies, if it’s S3 encryption , it will be copied ; if it’s using KMS then you have to handle manually by your self</p></li><li><p>Any existing objects before turn on this feature needs to be copied manually to new region (only replicate new PUT)</p></li><li><p><strong>Versioning must be enabled</strong></p></li><li><p>The remote bucket can be managed by different user</p></li><li><p>Can replicate objects only with certain prefix (folder)</p></li><li><p>If delete object and specify the version , then the remote bucket won’t delete that object (have to manually );</p><ul><li>From another material, it’s said “delete and life cycle actions are not replicated”</li></ul></li><li><p>You can specify A bucket replicate to B and then specify B replicate to A. This will result in A and B synced.</p></li><li><p>Once the replication is turned on , you can head-object to S3 object and check the ReplicationStatus metadata</p></li><li><p>Very useful feature when you want to accelerate upload to a remote region, upload to local region then replicate to remote</p></li></ul><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/05_S3_CrossRegionReplication.png?raw=true" alt="cross Region Replication cli"></p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/05_S3_CrossRegionRepli_Policy.png?raw=true" alt="cross Region Replication policy"></p><h1 id="026mp4-hands-on-demo-on-s3-polices-and-acl"><a class="markdownIt-Anchor" href="#026mp4-hands-on-demo-on-s3-polices-and-acl"></a> 026.mp4 - hands on demo on S3 Polices and ACL</h1><ul><li>Add/Edit/Delete bucket/bucket object permissions (which grantee has what access) --either at bucket level or object level</li><li>Create bucket policy (JSON) – AWS Policy Editor or copy from samples – only available at bucket level</li></ul><p>How to spcify a S3 object using ARN, for example<br>arn:aws:s3:::examplebucket/developers/design_info.doc</p><blockquote><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-arn-format.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-arn-format.html</a></p></blockquote><p>AWS S3 RRS option<br><a href="https://aws.amazon.com/s3/reduced-redundancy/" target="_blank" rel="noopener">https://aws.amazon.com/s3/reduced-redundancy/</a></p><p>S3 Bucket URL<br><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html</a></p><p>Difference of Durability &amp; Availability &amp; Concurrent facility fault torlerance<br><a href="https://aws.amazon.com/s3/reduced-redundancy/" target="_blank" rel="noopener">https://aws.amazon.com/s3/reduced-redundancy/</a></p><p>How to share object with others via pre-signed URL<br><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html</a></p><ul><li>use (your credential + expiretime) to create the url</li></ul><p>Difference between : (ACL is legacy method to control access to S3)<br>Bucket ACL/policy<br>Object ACL/Policy</p><p>Error Message code,<br><a href="https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html</a><br>For durability, RRS objects have an average annual expected loss of 0.01% of objects. If an RRS object is lost, when requests are made to that object, Amazon S3 returns a 405 (Method Not Allowed) error.</p><h2 id="best-practise-to-max-the-s3-performance"><a class="markdownIt-Anchor" href="#best-practise-to-max-the-s3-performance"></a> Best practise to max the S3 performance</h2><h3 id="key-naming"><a class="markdownIt-Anchor" href="#key-naming"></a> Key naming</h3><p>Useful when your S3 reach 100 TPS (100 request per second)<br><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html</a></p><ul><li>S3的object的path类似于key，为了能够均匀分布使得存取性能优越可以：<br>1） 加hash在key前面<br>2） reverse key string</li></ul><p>Sample of anti-pattern</p><p>All the files will be saved on same partition which not favor of concurrent searching. Especially when query reach 100TPS , must not allow below naming converntion.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&lt;my-buckt&gt;/2018-0601-001.png</span><br><span class="line">&lt;my-buckt&gt;/2018-0601-002.png</span><br><span class="line">&lt;my-buckt&gt;/2018-0601-003.png</span><br><span class="line">&lt;my-buckt&gt;/2018-0602-001.png</span><br><span class="line">&lt;my-buckt&gt;/2018-0601-002.png</span><br></pre></td></tr></table></figure><h3 id="s3-transfer-accerleration"><a class="markdownIt-Anchor" href="#s3-transfer-accerleration"></a> <strong>S3 Transfer Accerleration</strong></h3><ul><li>Once enabled, it will give a new endpoint;</li><li>it support automatically route upload to closest location.(Make use of Edge Locations)</li></ul><h3 id="multi-part-uploading"><a class="markdownIt-Anchor" href="#multi-part-uploading"></a> multi part uploading</h3><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html</a></p><p>Get request: put Range in the header</p><h3 id="make-use-of-cloudfront"><a class="markdownIt-Anchor" href="#make-use-of-cloudfront"></a> Make use of CloudFront</h3><p>Difference between ElasticCache and Cloudfront — which is the destination to cache S3 objects?<br>ElastiCache uses redis and memcached to improve the performance of web applications by allowing you to retrieve information from fast, managed, in-memory data stores, instead of relying entirely on slower disk-based databases.</p><p>While CloudFront is a global content delivery network (CDN) service that accelerates delivery of your websites, APIs, video content or other web assets</p><p><a href="https://acloud.guru/forums/aws-certified-solutions-architect-associate/discussion/-KbKW7V5i1NIZ6n9S_TK/what_is_the_difference_between" target="_blank" rel="noopener">https://acloud.guru/forums/aws-certified-solutions-architect-associate/discussion/-KbKW7V5i1NIZ6n9S_TK/what_is_the_difference_between</a></p><p>S3 detect data corruption<br>Content-MD5 Checksum and CRCs to detect data corruption<br><a href="https://aws.amazon.com/s3/faqs/#" target="_blank" rel="noopener">https://aws.amazon.com/s3/faqs/#</a></p><h1 id="hands-on"><a class="markdownIt-Anchor" href="#hands-on"></a> Hands-On</h1><ul><li>Enable , disable versioning at bucket level</li><li>Before enable version , object has default version id of <strong>null</strong></li><li>Delete &amp; revert deletion of object</li><li>List versions for an object, apply a selected version</li><li>Add life cycle rule to selected bucket: which scope in the selected bucket will be moved to what storage after defined days and be deleted from S3 after another defined days.</li><li>Enable logging ( S3 access related logging )</li><li>Add tag to S3 bucket (easy management)</li><li>Define cross region replication</li><li>Define events – SNS/SQS or Lambda</li><li>Requester Pays</li></ul><h2 id="new-feature"><a class="markdownIt-Anchor" href="#new-feature"></a> New feature</h2><p>Retrieve data within minutes from Glacier<br><a href="https://aws.amazon.com/about-aws/whats-new/2016/11/access-your-amazon-glacier-data-in-minutes-with-new-retrieval-options/" target="_blank" rel="noopener">https://aws.amazon.com/about-aws/whats-new/2016/11/access-your-amazon-glacier-data-in-minutes-with-new-retrieval-options/</a></p><p>S3 Analytics :  storage data analysis</p><ul><li>Category the access pattern by object prefix,name,bucket or tag to help you create proper rule</li></ul><p>S3 Inventory:</p><ul><li>Save the money of List API when you have loads of object.Generate List every day (or week depending on your config)</li><li>need to config policy to allow S3 to write inventory list report into your bucket</li></ul><p>Object Tag:</p><ul><li>Max 10 tag per object</li><li>Tag can be used in LifeCycle management, Access Control or Storage Analysis</li></ul><p>CloudTrail: S3 Data Events:</p><ul><li>Object level events</li><li>Security and Audit purpose</li></ul><p>CloudWatch: S3 Metrics</p><ul><li>Not free; $0.3 per metrics</li><li>every 1 min</li></ul><h2 id="vpc-endpoint-for-s3"><a class="markdownIt-Anchor" href="#vpc-endpoint-for-s3"></a> VPC endpoint for S3</h2><ul><li>Security Control<ul><li>VPC can specify allow certain action to specific S3</li><li>S3 can specify allow  which vpc to access it.</li></ul></li></ul><h1 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h1><blockquote><p>024.mp4 - S3 overview</p></blockquote><h1 id="other-important-thing-about-s3"><a class="markdownIt-Anchor" href="#other-important-thing-about-s3"></a> Other important thing about S3</h1><h2 id="how-s3-authenticate-the-http-request"><a class="markdownIt-Anchor" href="#how-s3-authenticate-the-http-request"></a> how S3 authenticate the http request</h2><ul><li>User use secret key to encrypt a header signature</li><li>AWS S3 received the request, then retrive the user secret</li><li>AWS encrypt to get the header signature</li><li>AWS compares the signature with user submitted one, if same then pass the authentication.</li></ul><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> S3 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Handson NodeJs</title>
      <link href="2018/04/09/markdown/AWS/AWS2018/06_HandsOn_NodeJSDevelopment/"/>
      <url>2018/04/09/markdown/AWS/AWS2018/06_HandsOn_NodeJSDevelopment/</url>
      
        <content type="html"><![CDATA[<h1 id="027mp4-028mp4-029mp4-set-up-dev-environments"><a class="markdownIt-Anchor" href="#027mp4-028mp4-029mp4-set-up-dev-environments"></a> 027.mp4 , 028.mp4, 029.mp4 – Set up dev environments</h1><ul><li><p>create users and download their access credentials</p></li><li><p>create A group stands for developer, attach policy to group and add users into the group</p></li><li><p>create a role to stands for the EC2 instance</p></li><li><p>create security group — attach to VPC / define inbound and outbound rules (open http/https/ssh)</p></li><li><p>launch instance — select community AMI; enable public ip; attach the role created;protect against accidental termination; advanced details (put bash script) ; tag it; attach security group ; create and download keypair to access the instance</p></li><li><p>install 2 useful plugins for atom: remote-edit git-plus ; edit EC2 server file ,save refresh the page</p></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> AWS developer </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Python 101</title>
      <link href="2018/03/18/markdown/python/Python101/"/>
      <url>2018/03/18/markdown/python/Python101/</url>
      
        <content type="html"><![CDATA[<h1 id="pythcharm-edu"><a class="markdownIt-Anchor" href="#pythcharm-edu"></a> Pythcharm Edu</h1><p>Python不用分号换行；<br>处理string支持负数参数从末尾处理；数组也是！！！<br>带换行符号的一堆字符串可以用&quot;&quot;“三个双引号”&quot;“扩起来。<br>数组支持：表示till<br>Tuples类型: 元组<br>[]数组，（）元组，{}字典<br>function的注释在function定义下一行，”&quot;&quot; “”&quot;<br><strong>init</strong>(self) 初始化函数，self 是必须<br>新的import方式：from python文件名 import function或者class名</p><ul><li>No need to escape &quot; inside ‘’</li><li>print(“Hello, %s! I am %d years old” % name % year)</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> python </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Immersion Day</title>
      <link href="2018/03/14/markdown/AWS/AWS2018/AWS_ImmersionDay/"/>
      <url>2018/03/14/markdown/AWS/AWS2018/AWS_ImmersionDay/</url>
      
        <content type="html"><![CDATA[<h1 id="agenda"><a class="markdownIt-Anchor" href="#agenda"></a> Agenda</h1><p><a href="http://aws.johnhildebrandt.info/" target="_blank" rel="noopener">http://aws.johnhildebrandt.info/</a></p><h2 id="introduction-to-aws-ec2-overview"><a class="markdownIt-Anchor" href="#introduction-to-aws-ec2-overview"></a> Introduction to AWS &amp; EC2 Overview</h2><p><a href="https://stackoverflow.com/questions/29575877/aws-efs-vs-ebs-vs-s3-differences-when-to-use" target="_blank" rel="noopener">https://stackoverflow.com/questions/29575877/aws-efs-vs-ebs-vs-s3-differences-when-to-use</a></p><p>EC2 attach EBS and then the EBS snapshot will be saved into S3( mandatroy)-Snapshot freequency(snapshot rotation)<br>When EC2 restart — attach and re-attach the EBS<br>The EC2 security group can be stacked to form other security group<br>By default the EC2 don’t have public ip unless choose the optional service<br>SSH private key is not stored in AWS<br>Instance Metadata: used to retrieve the instance information (magic ip hosting http service to retrieve the current instance information)</p><p>9:30am - 10:15amEC2 Immersion Lab<br>10:15am - 10:30amBreak</p><h1 id="networking-in-aws"><a class="markdownIt-Anchor" href="#networking-in-aws"></a> Networking in AWS</h1><p>Security Groups :<br>VIF ?<br>DX location</p><p>11:15am - 12:00pmVPC Immersion Lab<br>12:00pm - 1:00pmInnovation at scale video</p><h1 id="storage-on-aws"><a class="markdownIt-Anchor" href="#storage-on-aws"></a> Storage on AWS</h1><p>1:45pm - 2:15pmS3 Immersion Lab<br>2:15pm - 2:30pmBreak / Q &amp; A<br>2:30pm - 3:15pmSecurity Essentials<br>3:15pm - 3:45pmIAM Hands on Lab<br>3:45pm - 4:00pmBreak / Q &amp; A<br>4:00pm - 5:00pmAWS Architecture Best Practices</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - EC2</title>
      <link href="2018/02/28/markdown/AWS/AWS2018/04_EC2/"/>
      <url>2018/02/28/markdown/AWS/AWS2018/04_EC2/</url>
      
        <content type="html"><![CDATA[<h1 id="login-ec2"><a class="markdownIt-Anchor" href="#login-ec2"></a> Login EC2</h1><ul><li>for ubuntu it’s ubuntu@hostname instead of ec2-user@hostname</li></ul><h1 id="016mp4-elastic-compute-cloud"><a class="markdownIt-Anchor" href="#016mp4-elastic-compute-cloud"></a> 016.mp4 Elastic Compute Cloud</h1><p>Region: same region same price, same latency , same regulation<br>Available Zone: same datacentor<br>Edge Location: Cloud Front</p><p>purchase mode,</p><ol><li>on-demand instances</li><li>reserved instances</li><li>spot instances (错峰用机)</li></ol><p>Q: How do I select the right instance type?</p><p>Amazon EC2 instances are grouped into 5 families: General Purpose, Compute Optimized, Memory Optimized, Storage Optimized and Accelerated Computing instances.</p><ul><li>General Purpose Instances have memory to CPU ratios suitable for most general purpose applications and come with fixed performance (M5, M4) or burstable performance (T2);</li><li>Compute Optimized instances (C5, C4) have proportionally more CPU resources than memory (RAM) and are well suited for scale out compute-intensive applications and High Performance Computing (HPC) workloads;</li><li>Memory Optimized Instances (X1e, X1, R4) offer larger memory sizes for memory-intensive applications, including database and memory caching applications;</li><li>Accelerating Computing instances (P3, P2, G3, F1) take advantage of the parallel processing capabilities of NVIDIA Tesla GPUs for high performance computing and machine/deep learning; GPU Graphics instances (G3) offer high-performance 3D graphics capabilities for applications using OpenGL and DirectX; F1 instances deliver Xilinx FPGA-based reconfigurable computing;</li><li>Storage Optimized Instances (H1, I3, D2) that provide very high, low latency, I/O capacity using SSD-based local instance storage for I/O-intensive applications, with D2 or H1, the dense-storage and HDD-storage instances, provide local high storage density and sequential I/O performance for data warehousing, Hadoop and other data-intensive applications.</li></ul><h1 id="019-hands-on-connect-to-ec2-widnows-instance"><a class="markdownIt-Anchor" href="#019-hands-on-connect-to-ec2-widnows-instance"></a> 019 hands-on connect to EC2 widnows instance</h1><p>Difference with creating Linux instance,</p><ul><li><strong>RDP (3389)</strong> need to be specified at security group (inbound)</li><li>Windows Administrator password is dynamically randomly generated and as a PEM just like the linux instance (download)</li><li>need to extract online and get administrator user and password</li></ul><h2 id="ec2config"><a class="markdownIt-Anchor" href="#ec2config"></a> EC2Config</h2><p>About the EC2Config service (New name EC2Launch)<br><a href="http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_WinAMI.html" target="_blank" rel="noopener">http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_WinAMI.html</a></p><ul><li>built in function executed to enable the box to work with aws cloud. (for example the get metadata link)</li><li>can be used to initialize the windows box</li><li>can be used to trigger some management command for the box (like mount the EBS, etc)</li><li>Note. EC2Launch replaces EC2Config on Windows Server 2016 AMIs.</li></ul><h1 id="020-hands-on-connect-to-ec2-via-macos-client"><a class="markdownIt-Anchor" href="#020-hands-on-connect-to-ec2-via-macos-client"></a> 020 hands-on connect to EC2 via MacOS client</h1><p>For linux</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">chmod 400 xxx.pem</span><br><span class="line">ssh -i "/path/to/pem/xxx.pem" remotehostname</span><br></pre></td></tr></table></figure><p>For Windows<br>download CORD RDP client</p><h1 id="021-create-a-custom-ami"><a class="markdownIt-Anchor" href="#021-create-a-custom-ami"></a> 021 create a custom AMI</h1><p>iptable rerounting</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8080</span><br></pre></td></tr></table></figure><p>After everything is setup<br>select the instance -&gt; action -&gt; create image</p><p>AMI --&gt; Images<br>Select the image being created we can<br>put it as Public<br>copy it to other region and create a instance based on this customized instance<br>AMI image bill？？？（public or not）</p><h1 id="022-ebs-elastic-block-storage-for-ec2"><a class="markdownIt-Anchor" href="#022-ebs-elastic-block-storage-for-ec2"></a> 022 EBS (Elastic Block Storage) for EC2</h1><p>EBS(Elastic Block Storage)</p><ul><li>attached to an EC2 instance</li><li>persist independently from the life of the EC2 instance</li><li>pay only for what you use</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">lsblk</span><br><span class="line">sudo file -s /dev/xvdb</span><br><span class="line">sudo mkfs -t ext4 /dev/xvdb</span><br><span class="line">sudo file -s /dev/xvdb</span><br><span class="line">sudo mkdir /data</span><br><span class="line">sudo mount /dev/xvdb /data</span><br><span class="line">sudo nano /etc/fstab</span><br><span class="line">sudo mount -a</span><br><span class="line">sudo touch /data/test.txt</span><br></pre></td></tr></table></figure><ul><li>create a snapshot for the EBS</li><li>Create a new EBS volume based on the snapshot just created</li><li>Create another EC2 instance, selecting the snapshot just created when adding extra EBS to the instance</li></ul><p>Instance Storage<br>Instance Storage:<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html</a><br>Not available for low end EC2<br>must use block device mapping to attach them as the instance is launched.</p><h2 id="ebs-snapshots"><a class="markdownIt-Anchor" href="#ebs-snapshots"></a> EBS snapshots</h2><p>point-in-time snapshot<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html</a></p><ul><li>Asyncronize creation, but status is pending until fully finished</li><li>When trigger the snapshot command, EBS needs to be unmounted (see below example for /dev/sdh); or stop the instance if EBS used as root device; or user guarantee no file write to the EBS. But once the command is triggered, the volume can be mounted back when snapshot status is “Pending”.</li><li>We can have multiple pending snapshots pointing to different time. But total number of pending snapshot will have a limit.</li><li>EBS snapshots are only available through the Amazon EC2 APIs, not S3 APIs.也就是说虽然snapshot是存在S3的，但并不代表可以使用S3 API访问snapshot的内容。仍旧需要通过EC2来访问这些snapshot</li><li>每个snapshot都有unique id，可以通过snapshot id来恢复volume</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">umount -d /dev/sdh</span><br></pre></td></tr></table></figure><ul><li>command used to create EBS snapshot. (Different in aws cli and aws tools for windows Powershell)</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">aws ec2 create-snapshot --volume-id vol-1234567890abcdef0 --description &quot;This is my root volume snapshot.&quot;</span><br></pre></td></tr></table></figure><figure class="highlight powershell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">PS C:\&gt; <span class="built_in">New-EC2Snapshot</span> <span class="literal">-VolumeId</span> vol<span class="literal">-12345678</span> <span class="literal">-Description</span> <span class="string">"This is a test"</span></span><br></pre></td></tr></table></figure><h2 id="如何使用tag管理ec2的权限"><a class="markdownIt-Anchor" href="#如何使用tag管理ec2的权限"></a> 如何使用tag管理ec2的权限</h2><ul><li>ec2创建一个叫username的tag，值设为有权限访问该ec2的用户名</li><li>apply policy给用户</li><li>当判断用户权限的时候，使用用户id和ec2的tag的“username”的value比较，如何等于则赋权</li></ul><p>例子如下：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">      <span class="attr">"Version"</span> : <span class="string">"2012-10-17"</span>,</span><br><span class="line">           <span class="attr">"Statement"</span> :</span><br><span class="line">      [</span><br><span class="line">           &#123;</span><br><span class="line">                <span class="attr">"Effect"</span> : <span class="string">"Allow"</span>,</span><br><span class="line">                <span class="attr">"Action"</span> : <span class="string">"ec2:*"</span>,</span><br><span class="line">                <span class="attr">"Resource"</span> : <span class="string">"*"</span>,</span><br><span class="line">                <span class="attr">"Condition"</span> : &#123;</span><br><span class="line">                     <span class="attr">"StringEquals"</span> : &#123;</span><br><span class="line">                          <span class="attr">"ec2:ResourceTag/UserName"</span> : <span class="string">"$&#123;aws:username&#125;"</span></span><br><span class="line">                     &#125;</span><br><span class="line">                &#125;</span><br><span class="line">           &#125;</span><br><span class="line">      ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="placement-groups"><a class="markdownIt-Anchor" href="#placement-groups"></a> Placement groups</h2><p>一种配置，创建了之后让aws知道启动EC2的时候如何分配资源</p><ul><li>cluster placement group： 一旦创建，加入同group的ec2 VM会被分配到同一个AZ，并优化之间的网络环境。</li><li>Spread placement group： 一旦创建，加入同group的ec2 VM会被分配到不同的硬件上，减少硬件失败造成的单点故障。</li><li>EC2运行的时候无法换placement group，停下来后可以换placegroup的配置；placement group在有EC2组员的时候不能删除。</li><li>不支持tag: can’t create tag for the placement group</li></ul><p><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html</a></p><h1 id="related-reading"><a class="markdownIt-Anchor" href="#related-reading"></a> Related Reading</h1><h2 id="ssd-backed-ebs-and-hdd-backed-ebs"><a class="markdownIt-Anchor" href="#ssd-backed-ebs-and-hdd-backed-ebs"></a> SSD-backed EBS and HDD-backed EBS</h2><p><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html</a></p><ul><li><p>SSD-backed volumes optimized for transactional workloads involving frequent read/write operations with small I/O size, where the dominant performance attribute is IOPS</p><ul><li>GP2 (General Purpose): 默认通用类型的，IOPS可以在100-1w之间伸缩 ； 最大吞吐量160Mib/s</li><li>Provisioned IOPS SSD (io1)：最大能到32000； 最大吞吐量500Mib/s</li><li>适合： transactional, IOPS-intensive database workloads, boot volumes, and workloads that require high IOPS.</li></ul></li><li><p>HDD-backed volumes optimized for large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS</p><ul><li>Throughput Optimized HDD (st1)：IOPS最大500；最大吞吐量最大500<ul><li>适合：  MapReduce, Kafka, log processing, data warehouse, and ETL workloads.</li></ul></li><li>Cold HDD (sc1)： IOPS最大250；最大吞吐量250<ul><li>适合： ideal for less frequently accessed workloads with large, cold datasets.</li></ul></li></ul></li><li><p>IOPS (Input/Output Operations Per Second)，即每秒进行读写（I/O）操作的次数，多用于数据库等场合，衡量随机访问的性能。</p></li><li><p>IOPS的limitation是按照volumn来的，就是一块格式化好的存储mount上来，根据类型不同具备不同的IOPS上限。可以通过mount多块硬盘提高总体的IOPS，对于一台虚拟机来说，不管是什么volumn，最终的IOPS上限是一样的。只是HDD的硬盘要达到上限可能需要mount更多的volumn来实现reach iops的上限。IOPS上限8w/instance</p></li><li><p>Throughput类似，也是一台虚拟机有一个上限。不同的存储可以通过多mount来提高throughput但是无法突破单机上限。吞吐上限1750/instance</p></li><li><p>如何检查：Linux check current IOPS， r/s每秒读取次数；w/s每秒写入次数</p></li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">iostat -x</span><br></pre></td></tr></table></figure><ul><li>GP2类型的SSD具备伸缩性。这种伸缩性有复杂的机制。<ul><li>GP2 SSD size从1Gib-16Tib； size越大，baseline performance越好： 1Gib的IOPS=100，16Tib的IOPS=10000，正常情况3*volumn</li><li>每块GP2 SSD的初始I/O credit都是5.4m</li><li>BurstDuration=CurrentCredit/(BurstIOPS-3*VolumnInGB)</li><li>Initial Credit == Max Credit == 5.4 million</li><li>BurstIOPS是指这个当前的IOPS比原本assign的baseline performance部分高，但是最高不能超过3k，所以1000G以上的volumn本身baseline就高于3k，因此没有burst一说。</li><li>当磁盘IOPS没有达到base performance标准的时候，就可以累计credit<ul><li>CurrentCredit += baseIOPS*Duration;</li></ul></li></ul></li></ul><p>Pre-warming = Initialization</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> dd <span class="keyword">if</span>=/dev/zero of=/dev/xvdX bs=1M</span></span><br></pre></td></tr></table></figure><p>Encryption standards</p><p>industry-standard AES-256 algorithm.</p><p>Get Instance’s meta data</p><blockquote><p><a href="http://169.254.169.254/latest/meta-data/" target="_blank" rel="noopener">http://169.254.169.254/latest/meta-data/</a></p></blockquote><p><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html</a><br><a href="https://www.ibm.com/developerworks/cn/cloud/library/1620-openstack-metadata-service/" target="_blank" rel="noopener">https://www.ibm.com/developerworks/cn/cloud/library/1620-openstack-metadata-service/</a></p><p>Launch more like this feature:<br>哪些配置会从当前机器copy哪些不会<br><a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launch-more-like-this.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launch-more-like-this.html</a></p><ul><li>会： RAM Disk ID； EBS optimization设置； public ip address （setting）</li><li>不会： 网卡数目；storage配置。</li></ul><h1 id="不同region的ec2之间通信收费问题"><a class="markdownIt-Anchor" href="#不同region的ec2之间通信收费问题"></a> 不同region的ec2之间通信收费问题</h1><ul><li>EC2的data in 和out 是分别收费的。所以ec2 A 到 ec2 B收费，如果数据方向是A-》B，那就收A region的 data out， B region的data in</li></ul><h1 id="ec2-local-store-的persist问题"><a class="markdownIt-Anchor" href="#ec2-local-store-的persist问题"></a> EC2 local store 的persist问题</h1><p>ec2实例自带的storage, ec2只有running时候才bill，其它状态都不bill（EBS有单独的收费）</p><p>The data in an instance store persists only during the lifetime of its associated instance. If an instance reboots (intentionally or unintentionally), data in the instance store persists. However, data in the instance store is lost under the following circumstances:</p><ul><li><p>The underlying disk drive fails</p></li><li><p>The instance stops</p></li><li><p>The instance terminates</p></li></ul><p>Therefore, do not rely on instance store for valuable, long-term data. Instead, use more durable data storage, such as Amazon S3, Amazon EBS, or Amazon EFS.</p><ul><li>EC2 Compute Unit<br>Q: What is an “EC2 Compute Unit” and why did you introduce it?</li></ul><p>Transitioning to a utility computing model fundamentally changes how developers have been trained to think about CPU resources. Instead of purchasing or leasing a particular processor to use for several months or years, you are renting capacity by the hour. Because Amazon EC2 is built on commodity hardware, over time there may be several different types of physical hardware underlying EC2 instances. Our goal is to provide a consistent amount of CPU capacity no matter what the actual underlying hardware.</p><ul><li><p>Elastic IP address 不用也收费</p><ul><li>Elastic IP address 是静态ip；pubic ip address是动态的，每次instance重启都会变</li></ul></li><li><p>reverse DNS<br>Reverse DNS: From IP to Domain<br>A special PTR-record type is used to store reverse DNS entries. The name of the PTR-record is the IP address with the segments reversed + “.in-addr.arpa”.<br>For example the reverse DNS entry for IP 1.2.3.4 would be stored as a PTR-record for “4.3.2.1.in-addr.arpa”.<br>可以单独申请。</p></li><li><p>Nitro Hypervisor</p></li><li><p>未来的新虚拟技术，新的aws虚拟机都用这个技术搭建。一个虚拟机可以加载的ebs+网卡（vpc eni）个数加起来=pci device个数,max=27</p></li><li><p>the NVMe device names used by Linux based operating systems will be different than the parameters for EBS volume attachment requests and block device mapping entries such as /dev/xvda and /dev/xvdf. NVMe devices are enumerated by the operating system as /dev/nvme0n1, /dev/nvme1n1</p></li><li><p>Enhanced network</p><ul><li>必须启用HVM虚拟机</li><li>需要特定的driver，某些aws的机型内置了，如果没有需要手工安装配置。</li><li>必须启动在vpc中</li></ul></li><li><p>CloudWatch</p><ul><li><p>一旦启用，默认每分钟收取metrics；侦听信息默认保存两周（即使侦听对象terminate或者删除）；提供命令行mon-get-stats来保存侦听信息到s3或者“Amazon SimpleDB”</p><ul><li>about SimpleDB Vs DynamoDB ： 简而言之，DynamoDB很可能是用来替代simpledb的</li></ul><blockquote><p><a href="https://stackoverflow.com/questions/8961333/amazon-simpledb-vs-amazon-dynamodb" target="_blank" rel="noopener">https://stackoverflow.com/questions/8961333/amazon-simpledb-vs-amazon-dynamodb</a></p></blockquote></li><li><p>收费无关instance类型</p></li><li><p>gragh按照五分钟间隔显示曲线的时候，曲线点是按照五分钟内平均值拟合出来的，所以可能会跟1分钟间隔显示的曲线不同。</p></li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p><a href="https://youtu.be/agQMFIWr2h4" target="_blank" rel="noopener">https://youtu.be/agQMFIWr2h4</a></p></blockquote><p>Deep Dive EC2 Performance</p><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/04_EC2_History.png?raw=true" alt="EC2 History"></p></li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> EC2 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>InfoQ Readings</title>
      <link href="2018/02/07/markdown/Trending/InfoQ/"/>
      <url>2018/02/07/markdown/Trending/InfoQ/</url>
      
        <content type="html"><![CDATA[<p>Migrating Batch ETL to Stream Processing: A Netflix Case Study with Kafka and Flink<br><a href="https://www.infoq.com/articles/netflix-migrating-stream-processing" target="_blank" rel="noopener">https://www.infoq.com/articles/netflix-migrating-stream-processing</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> Tech Reading </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - IAM</title>
      <link href="2018/02/07/markdown/AWS/AWS2018/03_IdentityAndAccessManagement/"/>
      <url>2018/02/07/markdown/AWS/AWS2018/03_IdentityAndAccessManagement/</url>
      
        <content type="html"><![CDATA[<h1 id="iam-overview"><a class="markdownIt-Anchor" href="#iam-overview"></a> IAM Overview</h1><p>IAM: <strong>Identity and Access Management</strong></p><h2 id="010mp4-overview"><a class="markdownIt-Anchor" href="#010mp4-overview"></a> 010.mp4 overview</h2><h2 id="011mp4"><a class="markdownIt-Anchor" href="#011mp4"></a> 011.mp4</h2><ul><li>Understand the difference between AWS IAM and customer IAM</li><li>Understand the difference between aws account and aws iam users</li><li>IAM is a service.</li><li>IAM control access by policies which is organized by “statement”, it include : resource (like a table); action(like access database); effect (like allow)</li><li>security compliance: Payment Card Industry (PCI) Data Security Standard (DSS)</li><li>Auditing : using CloudTrail</li><li>Credential Report: downloaded excel (the report is generated every 4 hours, so there’s delay)</li></ul><h3 id="user"><a class="markdownIt-Anchor" href="#user"></a> User</h3><p>如何表示user：</p><ul><li>arn: amazon resource name， 格式，<br>arn:aws:iam::[accountIDNum]:user/Bill<ul><li>loads of examples:</li></ul></li></ul><blockquote><p><a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html</a></p></blockquote><ul><li>id --如果用户是命令行创建,则可以拿到用户id</li><li>普通的unique的user name</li><li>never give root access</li></ul><p>Credential的类型：</p><ul><li>密码</li><li>access key</li><li>SSH key</li><li>Server Certificates</li></ul><p>用户re-name，</p><ul><li>console是不支持的，但是可以用CLI；powershell或者API</li></ul><h3 id="group"><a class="markdownIt-Anchor" href="#group"></a> Group</h3><ul><li><strong>不能nest group</strong></li><li>每个account最多100个group</li></ul><h3 id="roles"><a class="markdownIt-Anchor" href="#roles"></a> Roles</h3><ul><li>EC2 定义roles来访问其它aws 服务</li><li>赋给一个account下的user去访问其它account下的资源</li></ul><p>Role的定义包含：</p><ul><li>RoleName: unique name within the current aws account</li><li>RoleARN: unique within whole AWS<ul><li>arn:aws:iam::<uniqueaccountid>:role/<rolename></rolename></uniqueaccountid></li></ul></li><li>Delegation:<ul><li>Create a role in account that owns the resource; then attach policy to represent the role access; setup trust relationship (allow consume the role)</li></ul></li></ul><h3 id="identity-federation"><a class="markdownIt-Anchor" href="#identity-federation"></a> Identity Federation</h3><ul><li>正常aws一个account只允许5k user，但是做了id federation，就不受5k的限制了。</li><li>ways to do ID federation<ul><li>amazon cognito</li><li>OAuth2</li><li>SAML2.0</li><li>LDAP 或者 AD</li></ul></li></ul><h1 id="012mp4"><a class="markdownIt-Anchor" href="#012mp4"></a> 012.mp4</h1><ul><li>Create User and groups<br>Group can attach policy, and users will be added to Group(s)<br>support bulk creating users ( and each user can have their  access key(like public key in keypair) and secret access key(like secret key in key pair) which means password )</li><li>Create account password policy<ul><li>The password policy is configured at account level, and limitation is for users under that account</li><li>All users under a certain account will have a unique IAM url used to sign in.</li></ul></li><li>Create role<br>Role can represent services under current account, to create a role mean give certain service under current account to access to other resources. (create a role represent certain type of resource, then apply policy to the role to giva access to resource, create trust relationship to allow other resource or user to consume this role.<br>Role can delecate the access to resource to allow access between different aws account</li><li>download credentials report</li></ul><h2 id="identity-based-policy"><a class="markdownIt-Anchor" href="#identity-based-policy"></a> Identity-based policy</h2><ul><li>Identity-based policy ： 可以attach到user，group，role上的policy<ul><li>Managed policy：定义好以后可以复用的policy<ul><li>aws managed</li><li>customer managed</li></ul></li><li>in-line policy: hard code在user，group或者role的定义里的，不能重用的policy</li></ul></li></ul><p>aws managed policy<br><a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html" target="_blank" rel="noopener">https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html</a></p><h1 id="013mp4-trusted-advisor-service"><a class="markdownIt-Anchor" href="#013mp4-trusted-advisor-service"></a> 013.mp4 Trusted Advisor Service</h1><p>Trusted Advisor Services</p><ul><li>Cost Optimization<br>check unused EC2, RDS, Elastic LB, EBS, etc.</li><li>performance<br>check bottleneck Services</li><li>security<br>Check all security related config</li><li>Fault torlerance<br>Check all failover related config</li></ul><p>can setup to send notification regularly to corresponding emails.</p><h1 id="014mp4-iam-best-practise"><a class="markdownIt-Anchor" href="#014mp4-iam-best-practise"></a> 014.mp4 IAM Best Practise</h1><ul><li>Enable MFA and reduce root access<ul><li>support google athenticator</li></ul></li><li>Grant least privilege (start from nothing)<ul><li>default deny</li><li>try not using wildcard <em>:</em> policy</li><li>try to use policy template</li></ul></li><li>Create individual Users</li><li>Manage permissions with group</li><li>Use IAM roles to manage permissions accross account and between Services under your account<ul><li>external ID condition in policy to allow 3rd party</li></ul></li><li>Restrict further with conditions (for example, must use MFA when access certain api)<ul><li>put condition to policy</li></ul></li><li>Use a strong password policy (expiration, length, forbidden re-use )</li><li>Rotate the password credential<ul><li>use credential report to identify</li><li>allow user to rotate credentials</li></ul></li><li>Enable AWS CloudTrail</li><li>AWS Key management Service</li><li>VPC security</li></ul><h2 id="exercise"><a class="markdownIt-Anchor" href="#exercise"></a> Exercise:</h2><ul><li>ARN form.</li><li>How to define policy for a website hosted on AWS</li><li>maximum number of IAM access keys per user? 2</li><li>one account can have only 1 alias</li><li>Users need their own access keys(not password !!!) to make programmatic calls to AWS using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or direct HTTP calls using the APIs for individual services.</li><li>Policy sample definition (009.html)</li></ul><h2 id="others"><a class="markdownIt-Anchor" href="#others"></a> others</h2><p>Sign in page url:<br>https://My_AWS_Account_ID.signin.aws.amazon.com/console/</p><h2 id="integrate-with-ad-best-practise"><a class="markdownIt-Anchor" href="#integrate-with-ad-best-practise"></a> Integrate with AD - Best Practise</h2><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/03_IntegrateWithMicrosoftAD.png?raw=true" alt="image of integrate with AD"></p><blockquote><p><a href="https://youtu.be/Iu-CpNFMELs" target="_blank" rel="noopener">https://youtu.be/Iu-CpNFMELs</a></p></blockquote><p>3 Options:</p><ul><li>Option1, EC2 join in on-promise AD domain<ul><li>Expose a lot of ports as needed by EC2 on cloud</li><li>AD connector<ul><li>Initial Solution:  LDAP forward to on-promise AD</li><li>EC2 domain join</li></ul></li></ul></li><li>Option2, Run AD on EC2 (Paas)<ul><li>Trust model Or Replication Model</li><li>Support NetBIOS name resolution</li></ul></li><li>Option3, Use AWS AD (SAAS) (released at Dec 2015)<ul><li>Trust model : no trust no replication</li><li>Work well with other Microsoft SAAS on AWS (MS SQL server)</li><li>one way trust and 2 way Trust<ul><li>one way trust is used to access on cloud resource using local AD account</li><li>2 way trust is used when cloud resource needs to access on-promise resource (say, printer)</li></ul></li><li>limitation (at 2016): don’t support ldaps; max 5w users</li></ul></li></ul><h1 id="federation"><a class="markdownIt-Anchor" href="#federation"></a> Federation</h1><h2 id="options"><a class="markdownIt-Anchor" href="#options"></a> Options</h2><h3 id="option-1-simple-ad"><a class="markdownIt-Anchor" href="#option-1-simple-ad"></a> Option 1, Simple AD</h3><ul><li>Microsoft AD Compatible;<ul><li>Can’t set up trust bettwen M AD and Simple AD</li><li>Don’t support schema extention, MFA, ldaps</li></ul></li><li>Cheapest Option, suitable for &lt;5000 users</li></ul><h3 id="option-2-microsoft-ad"><a class="markdownIt-Anchor" href="#option-2-microsoft-ad"></a> Option 2, Microsoft AD</h3><ul><li>Support trust with M AD</li><li>Don’t support schema extention, MFA</li><li>Suitable for &gt;5000 users and need trust with on-premise M AD</li></ul><h3 id="option-3-ad-connector"><a class="markdownIt-Anchor" href="#option-3-ad-connector"></a> Option 3, AD Connector</h3><ul><li>Proxy service to directly use on-premise M AD</li><li>Support MFA</li></ul><h3 id="option-4-saml-security-assertion-markup-language"><a class="markdownIt-Anchor" href="#option-4-saml-security-assertion-markup-language"></a> Option 4: SAML : Security Assertion Markup Language</h3><ul><li>Identity provider and AWS build up relationship in advance</li><li>login flow will use Identify provider’s service to do the authentication</li></ul><h2 id="how-to-plan-the-federation"><a class="markdownIt-Anchor" href="#how-to-plan-the-federation"></a> How to plan the federation</h2><p>Choose your SAML provider : for example ADFS, Active Directly</p><h3 id="federation-high-level-steps"><a class="markdownIt-Anchor" href="#federation-high-level-steps"></a> Federation High Level Steps</h3><p><img src="https://github.com/racheliurui/markdown/blob/master/AWS/AWS2018/images/03_IAM_SAML.png?raw=true" alt="SAML process"></p><ul><li>Prepare SAML provider in your network</li><li>Config SAML provider in AWS IAM</li><li>Config Roles for your federated users</li><li>Create Groups in AD matching IAM Roles</li><li>Config SAML IdP (Identity provider) &amp; create assertions for SAML auth response</li><li>Post SAML assertion result to AWS login url</li></ul><p>So when user raise a request, it will be mapped to a group, and that group is mapped to IAM role.<br>When a user belongs to multiple group, and then the group will be mapped to multiple IAM role, user will have option to signin with one role.</p><p>You can use aws CLI with SAML ( the session will be persisted by default for 60 min)</p><h1 id="aws-active-directory-deep-dive"><a class="markdownIt-Anchor" href="#aws-active-directory-deep-dive"></a> AWS Active Directory deep dive</h1><ul><li>When create, 2 options</li><li>Option 1, create AWS Simple Directory<ul><li>When create , specify VPC and 2 subnets (in different AZ)</li><li>for linux to join in the AWS simple directory, need install SSSD and use SSSD to join in the domain (need reboot)</li><li>for windows, when create ec2 instance, you can specify the AWS simple directory</li></ul></li><li>Option 2, create AD connector to on premise AD<ul><li>pointing to MS AD (need VPN), and also need specify VPC and 2 subnets.</li></ul></li></ul><h1 id="references"><a class="markdownIt-Anchor" href="#references"></a> References</h1><blockquote><p>AWS AD federation<br><a href="https://www.youtube.com/watch?v=ytSjsEER-y0" target="_blank" rel="noopener">https://www.youtube.com/watch?v=ytSjsEER-y0</a></p></blockquote><blockquote><p>Best Practise of integrate with MS AD (2016 Nov)<br><a href="https://youtu.be/Iu-CpNFMELs" target="_blank" rel="noopener">https://youtu.be/Iu-CpNFMELs</a></p></blockquote><blockquote><p>AWS AD deep dive<br><a href="https://youtu.be/CY-xvo8Cc54" target="_blank" rel="noopener">https://youtu.be/CY-xvo8Cc54</a></p></blockquote><h1 id="progressive-journey-through-aws-iam-federations-options-2015-sec07"><a class="markdownIt-Anchor" href="#progressive-journey-through-aws-iam-federations-options-2015-sec07"></a> Progressive Journey Through AWS IAM Federations Options - 2015 (SEC07)</h1><blockquote><p><a href="https://youtu.be/-XARG9W2bGc" target="_blank" rel="noopener">https://youtu.be/-XARG9W2bGc</a></p></blockquote><h2 id="saml-primer-security-assertion-markup-language"><a class="markdownIt-Anchor" href="#saml-primer-security-assertion-markup-language"></a> SAML Primer – Security Assertion Markup Language</h2><ul><li>Configuration Time:<ul><li>Identity Provider and Service Provider to exchange metadata in advance</li></ul></li><li>Run time:<ul><li>Cryptographic Trusted Assertion (login flow)</li></ul></li></ul><h2 id="demo-automating-onboarding"><a class="markdownIt-Anchor" href="#demo-automating-onboarding"></a> Demo – Automating onboarding</h2><ul><li><p>Use python script to create providers, roles and policies into AWS</p><ul><li>python script to create IAM provider</li><li>python script to create role with inline policy</li><li>python to generate ldif (ldap exchange file) file and load group definition back into ldap</li></ul></li><li><p>Use python script to create group definitions into Directory</p></li></ul><h2 id="aws-business-partners-solution-demo"><a class="markdownIt-Anchor" href="#aws-business-partners-solution-demo"></a> AWS Business Partner’s Solution Demo</h2><h1 id="nova"><a class="markdownIt-Anchor" href="#nova"></a> Nova</h1><ul><li>Solution quite like Ranger</li><li>User authenticate request with Nova is routed to AD</li><li>AD pass then authentication and reply user id to Nova.</li><li>Nova query user group information from AD, map to AWS group saved in its database</li><li>Nova reply customer ask for which role the user want to log in</li><li>Nova request sts from IAM and return to user (either login aws console or get temp access keys)</li></ul><h1 id="nova-2"><a class="markdownIt-Anchor" href="#nova-2"></a> Nova 2</h1><ul><li>add another dimention by using tag – to label team / resource</li></ul><h1 id="nova-3"><a class="markdownIt-Anchor" href="#nova-3"></a> Nova 3</h1><ul><li>allow user to select which account, which application and which environment to work on</li></ul><h1 id="basics"><a class="markdownIt-Anchor" href="#basics"></a> Basics</h1><p>Identity based Policy vs Resource based policy:<br>Identity based：for a given user, define what resource he/she can access<br>Resource based：for a given resource, see who have access to it</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
            <tag> IAM </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Overview</title>
      <link href="2018/02/05/markdown/AWS/AWS2018/01_Overview/"/>
      <url>2018/02/05/markdown/AWS/AWS2018/01_Overview/</url>
      
        <content type="html"><![CDATA[<h1 id="terminology"><a class="markdownIt-Anchor" href="#terminology"></a> Terminology</h1><ul><li>16 Regions : 不同的Region价格不同，部署的服务不同</li><li>42 Availability Zone : zone之间的故障是完全隔离的</li><li>50 Edge Locations: 缓存；加速</li></ul><h1 id="services"><a class="markdownIt-Anchor" href="#services"></a> Services</h1><p>FAAS (function as a service; serverless service)<br>问题：那些服务是serverless的，哪些不是？ 给定一个场景，需要哪些服务的组合？<br><a href="https://aws.amazon.com/serverless/" target="_blank" rel="noopener">https://aws.amazon.com/serverless/</a><br>AWS serveless service： lambda, dynamodb, api gateway , S3, AWS Step Functions,SNS,SQS, Kinesis, Athena （interactive query against big data), tools and services (city9)</p><h2 id="compute-services"><a class="markdownIt-Anchor" href="#compute-services"></a> Compute services</h2><ul><li>EC2</li><li>ECS （docker）</li><li>Elastic BeanStalk： 自动部署环境。</li><li>ELB （balancer）</li><li>Autoscaling</li><li>Lambda</li></ul><h2 id="storage-services"><a class="markdownIt-Anchor" href="#storage-services"></a> Storage Services</h2><ul><li>S3</li><li>Glacier</li><li>EBS (Elastic Block Storage): attach to EC2 instances<br>EBS to EC2 , n:1</li><li>EFS (Elastic File Storage) to EC2, n:n</li><li>Storage Gateway: 用来连接和同步s3 bucket with objects with 企业私有的数据中心。</li><li>Snowball Device：比storage gateway快。</li></ul><p>S3在VPC（virtual private cloud）之外。VPC需要创建Endpoint 去连S3 bucket with objects（存储服务的映射），S3 bucket再连Glacier （通过Glacier Vault）， 从而定义archive的规则。</p><h2 id="database"><a class="markdownIt-Anchor" href="#database"></a> database</h2><ul><li>RDS ：oracle ， sql server， mysql， postgres, Aurora, MariaDB (MySQL community version)</li><li>Aurora： 企业版MySQL/PostgreSQL</li><li>Dynamodb： servless ， nosql</li><li>redshift： datawarehouse （petabytes）， based on postgres</li><li>ElastiCache （两种引擎可选：Redis， memcache）</li><li>AWS database migration service: Oracle -&gt; Aurora PostgreSQL</li></ul><p>Database sits in VPC</p><h2 id="network-and-content-delivery"><a class="markdownIt-Anchor" href="#network-and-content-delivery"></a> network and content delivery</h2><p>VPC<br>Cloud Front: Caching (Edge)<br>Route53: Domain Name Services<br>Direct connect: connect private datacenter --&gt; AWS<br>ELB (also compute):<br>Best Practise: deploy one VPC into muti hi-Availability zones.</p><h2 id="management-tools"><a class="markdownIt-Anchor" href="#management-tools"></a> management tools</h2><ul><li>CloudFormation: Infrastruture as a code. (YAML or JSON to define Infrastruture, git control)</li><li>CloudWatch: Monitoring &amp; Alarms &amp; Trigger</li><li>Trusted Adviser: Expert System(scan existing Infrastruture, give advices), security, performance, cost,etc…</li><li>OpsWorks : chief receipt (similar with CloudFormation)</li><li>CloudTrail : security and auditing, monitor all the api calls, including all the management calls ; can be integrated with CloudWatch</li></ul><h2 id="messaging"><a class="markdownIt-Anchor" href="#messaging"></a> Messaging</h2><ul><li>Simple Queue Service (SQS): Serveless service</li><li>Simple Notification Service (SNS)</li><li>Simple Email Service (SES): bulk delivery of Email</li></ul><p>Process decoupling example,<br>请求放sqs，连cloudwatch，请求spike的时候，cloudwatch的alarm触发生成更多EC2实例（autoscaling），cloudwatch连SES通知用户。反之scale down。</p><h2 id="security-identitiy-compliance"><a class="markdownIt-Anchor" href="#security-identitiy-compliance"></a> Security &amp; Identitiy &amp; Compliance</h2><ul><li>IAM （ Identity &amp; Access Management ）： IAM users， groups， roles</li><li>Directory Service - authentication ： 3rd party integration （oauth2） 例如facebook ， google， AD， ldap</li><li>Certificate Manager - SSL Certificates</li><li>KMS （Encryption Key Management Service）</li><li>WAF （Web Application Firewall）：extra layer</li></ul><h2 id="analytics"><a class="markdownIt-Anchor" href="#analytics"></a> Analytics</h2><ul><li>Elastic Map Reduce （EMR）：hadoop as a service</li><li>ElasticSearch Service</li><li>Kinesis： data streaming</li><li>QuickSight： Data Visualization</li><li>Data Pipeline： Process and move data</li></ul><h1 id="aws-free-tier"><a class="markdownIt-Anchor" href="#aws-free-tier"></a> AWS Free Tier</h1><p>EC2 ： 750 hours totally /month<br>Storage： 5G</p><p>Services： 不过期。只有量的限制。</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>AWS - Handson Static Website</title>
      <link href="2018/02/05/markdown/AWS/AWS2018/02_BulletProofStaticWebsite/"/>
      <url>2018/02/05/markdown/AWS/AWS2018/02_BulletProofStaticWebsite/</url>
      
        <content type="html"><![CDATA[<h1 id="set-up-a-bulletproof-website-with-aws"><a class="markdownIt-Anchor" href="#set-up-a-bulletproof-website-with-aws"></a> set up a bulletproof website with AWS</h1><ul><li>demo-005 part2: use Route53 service to buy a domain</li><li>demo-006 part3:<ul><li>create S3 bucket, upload the files; host static website (and visit using raw s3 url);</li><li>create another bucket with www naming and redirect the request to naked domain (<a href="http://domain.com" target="_blank" rel="noopener">domain.com</a>).</li></ul></li></ul><p>Tips, set the correct MIME type for uploaded files<br><a href="https://developer.mozzila.org/en-US" target="_blank" rel="noopener">https://developer.mozzila.org/en-US</a><br>search for “complete list of MIME type”</p><ul><li>demo-007 part4: use “Certificate Manager” service to create Certificate (give the domain as <a href="http://domain.com" target="_blank" rel="noopener">domain.com</a> and *.domain.com )</li><li>demo-008 part5: create distribution using cloudfront service to help with security(D-DOS),performance,fail over,attach certs<ul><li>default TTL : default is 24 hours, means refresh from S3 every 24 hours</li><li>alternative domain names: the domain name purchased</li><li>SSL certificate: custom SSL certificate (created in previous demo)</li><li>default root object: like index.html</li><li>how to manually trigger a refresh: create a invalidation, and using “*” to specify invalidate everything.</li></ul></li><li>demo-009 part6: go back to “Route53” and configure the “Hosted Zones”<ul><li>Create a “A-IPV4” typed “record set”, set the url to <a href="http://domain.com" target="_blank" rel="noopener">domain.com</a>, and alias target to the cloudfront endpoint.</li><li>Create another “CName” typed “record set”, url to <a href="http://www.domain.com" target="_blank" rel="noopener">www.domain.com</a>, and non-alias pointing to naked domain which will be routed to cloudfront endpoint.</li></ul></li></ul><h2 id="background-www-and-naked-domain-names"><a class="markdownIt-Anchor" href="#background-www-and-naked-domain-names"></a> background www and naked domain names</h2><p><a href="https://www.sitepoint.com/domain-www-or-no-www/" target="_blank" rel="noopener">https://www.sitepoint.com/domain-www-or-no-www/</a><br>In short words, www is the prefix to indicate the url is hosted on internet (in olden days). Now it’s not so necessary, if you skip the www prefix, then your host name is called “Naked”. Anyway, people might choose to be compatible to both www and naked domain.</p>]]></content>
      
      
      
        <tags>
            
            <tag> AWS </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 11</title>
      <link href="2018/02/04/markdown/Trending/MachineLearning/MachineLearning_11/"/>
      <url>2018/02/04/markdown/Trending/MachineLearning/MachineLearning_11/</url>
      
        <content type="html"><![CDATA[<h1 id="photo-ocr-optical-character-recognition"><a class="markdownIt-Anchor" href="#photo-ocr-optical-character-recognition"></a> photo OCR (Optical Character Recognition)</h1><p><strong>Pipeline</strong></p><p>解决复杂问题的思路。</p><p>以photo OCR为例， pipeline为：</p><p>图片–》识别文字区域–》文字分离–》单个文字识别</p><p>sliding window的使用： 例如一个图片找行人，已知一个算法可以识别一个20*100的方块里有没有人形，我们就可以使用从小到大不同尺寸的比例方块，每次移动一个sliding window的距离，切下来一个方块，调整到算法要求的比例尺寸，进行判断。</p><p>类似思路用在OCR上：</p><p>step1， 切方块，找可能的字符区域<br>step2， “expansion”算法，把字符区域放大，找出text rigion，并根据比例特征划掉干扰区域。<br>step3， 挑选出来的区域变成透明，其它区域全部遮住，对选出来的区域进行识别。<br>step4， 1D sliding window找出单个字符<br>step5， 识别单个字符</p><h2 id="如何得到大量训练数据"><a class="markdownIt-Anchor" href="#如何得到大量训练数据"></a> 如何得到大量训练数据</h2><p>以OCR为例，<br>real data，从真实图像中切出来的字母块<br>另外一个重要来源是： synthetic data： 人工合成</p><p>比如：<br>1） 使用字体库加随机背景制造。<br>2） 单个字体放大切块，进行distortion（变形）处理。</p><p>类似的思路用在声音识别中：<br>用一个标准的人声数据，加入不同的背景噪声，我们可以得到多个训练数据。</p><p>注意要不停问自己的问题，<br>1） 我的模型对吗？<br>2） 我要多久时间可以得到比现在多10倍的数据？我可以合成一些吗？ 我可以自己做吗？可以外包吗？（crowd source - 例如amazon mechenical turk）</p><h2 id="ceiling-analysis"><a class="markdownIt-Anchor" href="#ceiling-analysis"></a> ceiling analysis</h2><p>对于复杂问题，pipeline的每个节点都要资源，如何分配？ 使用ceiling analysis</p><p>从pipeline 源头开始，整个的训练数据的正确率70%，那么如果第一步的算法，我们直接假定算法准确率100%（将正确数据输入给下一步），最终正确率变成89%，iterate这个步骤（沿着pipeline方向），直到准确率提升到100%。</p><p>回过头看，那个步骤提升到100%对最终的准确率提升的最大？</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Photo OCR </tag>
            
            <tag> sliding window </tag>
            
            <tag> synthetic data </tag>
            
            <tag> ceiling analysis </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 10</title>
      <link href="2018/02/03/markdown/Trending/MachineLearning/MachineLearning_10/"/>
      <url>2018/02/03/markdown/Trending/MachineLearning/MachineLearning_10/</url>
      
        <content type="html"><![CDATA[<h1 id="数据为王"><a class="markdownIt-Anchor" href="#数据为王"></a> 数据为王</h1><p><strong>It’s not who has the best algorithm wins, it’s who has the most data.</strong></p><p>如何判断数据是否真的为王，以及多少数据就可以称王，可以使用learning curve帮助判断。<br>复习learning curve：</p><ul><li>横轴是训练集大小，纵轴是cost，<ul><li>先使用大小为m1的training set；求解后算cost，再带入CV set求cost，画两个点。</li><li>使用大小为m2的训练集。。。</li></ul></li><li>当训练集足够大，两条曲线贴近的时候，说明</li></ul><h1 id="如何解决数据集太大的问题"><a class="markdownIt-Anchor" href="#如何解决数据集太大的问题"></a> 如何解决数据集太大的问题</h1><h2 id="stochastic-gradient-descent"><a class="markdownIt-Anchor" href="#stochastic-gradient-descent"></a> Stochastic Gradient Descent</h2><p>传统Gradient descent 我们叫batch Gradient Descent，就是说算的时候所有数据都放进去。<br>Stochastic Gradient Descent, 换了一种思路，把数据一组一组取出来，使用当前选中的数据组调整参数，再取下一组数据。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">1） Randomly Shuffle Training Examples</span><br><span class="line">2)  repeat (1~10)&#123;</span><br><span class="line">    for i:=1...m&#123;</span><br><span class="line">    theta:=theta-alfpha(h(x)-y)x;</span><br><span class="line">        &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果有300，000，000数据，使用batch Gradient Descent，每次调整参数就需要所有的数据参与计算。但是对于stochastic gradient descent，一次repeat一次全数据参与，最多十次，运气好（同时也是数据多的时候），一次就可以得到很好的结果。<br>这是为什么这种算法很快的原因。</p><h3 id="stochastic-gradient-descent-converging"><a class="markdownIt-Anchor" href="#stochastic-gradient-descent-converging"></a> stochastic gradient descent converging</h3><p>如何判断算法在converging？<br>思路： 每组数据都用当前theta值算一下cost然后再更新theta， 每loop过1000组数据，算一下之前1k组数据cost的平均值。这样每1k组数据和之前1k组数据比较，看cost是不是在降低。<br>如何看图（横轴是iteration，纵轴是每隔1k的cost）：<br>1） alpha越小，曲线越平滑。越可能找到global optimization，但是慢<br>2） 1k看一次cost还能改成例如5k看一次，改的越大，曲线越平滑，但是缺点是要等很久才能再一次评估算法效果<br>3） 有时候曲线很多杂音，很难看出趋势，这时候可以调整1k到5k，可能趋势就可以看出来了，如果还是看不出，可能算法有误。<br>4） 如果曲线不下降反而上升，可能alpha太大了。</p><p>为了让它converge，我们可以随着学习，逐步降低alpha，例如 alpha=const1/(interationNum+const2)</p><h2 id="mini-batch-gradient-descent"><a class="markdownIt-Anchor" href="#mini-batch-gradient-descent"></a> mini-batch Gradient Descent</h2><ul><li>batch gradient descent : use m examples one iteration</li><li>stochastic gradient descent: use 1 examples one iteration</li><li>mini-batch gradient descent : use b examples one iteration (b is 2~200)</li></ul><p>** mini-batch 只有使用vectorized implementation 的时候性能才会比stochastic gradient descent。</p><h1 id="相关应用"><a class="markdownIt-Anchor" href="#相关应用"></a> 相关应用</h1><h2 id="online-learning"><a class="markdownIt-Anchor" href="#online-learning"></a> online learning</h2><p>类似stochastic gradient descent的思路，不停有新的streaming数据来，不停调整theta</p><h2 id="map-reduce"><a class="markdownIt-Anchor" href="#map-reduce"></a> map reduce</h2><p>大data set分成多组，并行计算公式中求和的部分，送到汇总节点，做最后一步的计算。</p><p>一些算法库，自动嵌入了map reduce功能。只要实现算法，实现的时候就会自动并行。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Stochastic Gradient Descent </tag>
            
            <tag> mini-batch Gradient descent </tag>
            
            <tag> Map Reduce </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 9</title>
      <link href="2018/01/27/markdown/Trending/MachineLearning/MachineLearning_9/"/>
      <url>2018/01/27/markdown/Trending/MachineLearning/MachineLearning_9/</url>
      
        <content type="html"><![CDATA[<h1 id="anomaly-detection"><a class="markdownIt-Anchor" href="#anomaly-detection"></a> Anomaly Detection</h1><p>Anomaly Detection 不规则检测。举例子：飞机引擎的各种参数，如果一个新的引擎的发热或者其它参数突然与众不同，那么我们肯定担心有什么问题。</p><p>应用场景：<br>Fraud Detection<br>Manufactoring</p><p>根据data建模p(x)，当新的x造成<span>$p(x)&lt;$$\epsilon$$$</span><!-- Has MathJax -->的时候，就是说明数据异常。</p><h2 id="gaussian-normal-distribution"><a class="markdownIt-Anchor" href="#gaussian-normal-distribution"></a> Gaussian (normal) Distribution</h2><p>公式表达：</p><span>$$\begin{align*}X\thicksim N(\mu,\sigma^2)\end{align*}$$</span><!-- Has MathJax --><span>$\thicksim$</span><!-- Has MathJax --> 读作“distributed as”，N是normal的意思，<span>$\mu$</span><!-- Has MathJax -->代表正态分布的最高点投射到横轴的读数，<span>$\sigma$</span><!-- Has MathJax -->表示的是正态分布的驼峰的宽度。<span>$/sigma$</span><!-- Has MathJax -->又叫standard deviation.<span>$$\begin{align*}p(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi }\sigma }exp(-\frac{(x-\mu)^2}{\sigma^2})\end{align*}$$</span><!-- Has MathJax --><p>For a dataset,</p><span>$$\begin{align*}\{x^\left(1\right),x^\left(2\right),...x^\left(m\right)\}, x\left(i\right)\in\Re\end{align*}$$</span><!-- Has MathJax --><p>有以下求值公式来推导已知数据的正态分布：</p><span>$$\begin{align*}\mu=\frac{1}{m}\displaystyle\sum\limits_{i=1}^m x^\left(i\right)\end{align*}$$</span><!-- Has MathJax --><span>$$\begin{align*}\sigma ^2=\frac{1}{m}\displaystyle\sum\limits_{i=1}^m (x^\left(i\right)-\mu )^2\end{align*}$$</span><!-- Has MathJax --><ul><li>在统计分析课中，有时候除以m会写成除以（m-1），在machine learning中，m通常很大，所以我们忽略这种不同。</li></ul><h2 id="density-estimation"><a class="markdownIt-Anchor" href="#density-estimation"></a> Density Estimation</h2><p>当数据是n维的时候，我们的p(x)公式变成了：</p><span>$$\begin{align*}p(x)= \prod_{j=1}^np(x_j;\mu_j,\sigma_j^2)\end{align*}$$</span><!-- Has MathJax --><h3 id="algorithm-evaluation"><a class="markdownIt-Anchor" href="#algorithm-evaluation"></a> Algorithm evaluation</h3><p>直接使用cv或者test set算正确率不准确，因为stewed data。<br>使用cv set选择<span>$\varepsilon$</span><!-- Has MathJax --></p><p>比如： 10000个正常引擎数据，40个不合格引擎的数据。</p><p>推荐，6000个数据用来做算法求正态分布<br>2000个正常引擎和20个不合格引擎的数据用来做CV set，选择<span>$\varepsilon$</span><!-- Has MathJax --><br>2000个正常引擎和20个不合格引擎的数据用来做test set，用来验证模型正确。</p><h3 id="anomaly-detection-vs-supervised-learning"><a class="markdownIt-Anchor" href="#anomaly-detection-vs-supervised-learning"></a> Anomaly Detection vs Supervised Learning</h3><p>Anomaly Detection： 特别适合属于Anomaly的数据非常少的情况，即y=0的数据非常多，y=1的数据也就几十个（1-20个）。未来的Anomaly data可能和现有的数据完全不一样。<br>Supervised Learning： 大量的数据，有positive和negative，未来的negative数据会跟现有的有一定的类似。</p><p>典型的对比，Fraud detection： 如果我们还不知道Fraud一般会有什么特征，那么使用Anomaly Detection我们可以找到特别不同寻常的操作；但是如果我们有大量的数据做参考，那么这种Fraud detection就可以使用Supervised Learning来建模了，例如spam letter的算法。</p><ul><li>Anomaly Detection的例子：  Fraud Detection； Manufacturing （例如飞机引擎）； 监控数据中心的机器</li><li>Supervised Learning的例子：Spam letter labeling；Weather Prediction； Cancer Classification</li></ul><h3 id="choose-correct-features-for-anomaly-detection-algorithm"><a class="markdownIt-Anchor" href="#choose-correct-features-for-anomaly-detection-algorithm"></a> Choose correct features for Anomaly detection Algorithm</h3><p>先把数据plot出来</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">% check the data size</span><br><span class="line">size(x)</span><br><span class="line">% plot the data</span><br><span class="line">hist(x)</span><br><span class="line">% adjust the plot display</span><br><span class="line">hist(x,50)</span><br><span class="line">% transform the data, until it looks more gaussian</span><br><span class="line">hist(x.^0.5,50)</span><br><span class="line">hist(x.^0.2,50)</span><br><span class="line">hist(x.^0.1,50)</span><br><span class="line">% define the new feature based on original one</span><br><span class="line">xNew=x.^0.1</span><br></pre></td></tr></table></figure><p>常用技巧： 使用正态分布找出异常数据，观察异常数据的异常之处，然后定义新的feature出来。<br>常用技巧： 挑选数据在异常时候能说明问题的，比如监控数据中心的例子中，cpu和网络traffic一般是线性关系，如果使用cpu/网络traffic就可以得到一个feature帮助识别cpu的infinite loop或死锁之类的问题。</p><h2 id="recommendation-system-formulation"><a class="markdownIt-Anchor" href="#recommendation-system-formulation"></a> Recommendation System Formulation</h2><h3 id="predicting-movie-rating"><a class="markdownIt-Anchor" href="#predicting-movie-rating"></a> Predicting Movie Rating</h3><p>使用Liner Regression来预测user对电影的打分。<br>方法： 创造feature，如action，romantic等，然后为每个movie量化这些feature，最后，根据用户之前的打分来找出用户的打分模型，从而预测用户的打分。</p><h4 id="collaborative-filtering"><a class="markdownIt-Anchor" href="#collaborative-filtering"></a> Collaborative filtering</h4><p>中心思想：</p><p>先猜测一下电影的feature<br>Loop<br>{<br>已知电影的feature，根据用户打分可以预测/修正用户的打分模型<br>已知用户打分模型（例如最喜欢action电影，不喜欢romance），可以根据用户打分预测/修正电影的feature参数<br>}</p><p>book store &amp; clothing store</p><h3 id="collaborative-filtering-algorithm"><a class="markdownIt-Anchor" href="#collaborative-filtering-algorithm"></a> Collaborative filtering Algorithm</h3><p>与其loop求值，不如将两组数据结合起来一起看。</p><p>cost function是：</p><span>$$\begin{align*}J(x^{(1)},,,x^{(n_m)},\theta^{(1)},,,\theta^{(n_u)})=\frac{1}{2}\sum_{ (i,j) :r(i,j)=1} (\theta^{(j)})^T x^{(i)} - y^{(i,j)} )^2 +\frac{\lambda }{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}{({x_k}^{(i)} )}^2+\frac{\lambda }{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}{({\theta_k}^{(j)} )}^2\end{align*}$$</span><!-- Has MathJax --><p>对应的算法是：<br>step 1， random 初始化所有的x和theta<br>step 2， gradient descent</p><p>求x，</p><span>$$\begin{align*}{x_k}^{(i)}:={x_k}^{(i)}-\alpha (\sum_{ j :r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)} ){\theta_k}^{(j)}  +\lambda {x_k}^{(i)} )\end{align*}$$</span><!-- Has MathJax --><p>求theta，</p><span>$$\begin{align*}{\theta_k}^{(j)}:={\theta_k}^{(j)}-\alpha (\sum_{ i :r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)} ){x_k}^{(i)}  +\lambda {\theta_k}^{(j)} )\end{align*}$$</span><!-- Has MathJax --><p>注意参数的random初始化，否则会跟Neron network类似的symetric问题。</p><h3 id="low-rank-matrix-factorization"><a class="markdownIt-Anchor" href="#low-rank-matrix-factorization"></a> low rank matrix factorization</h3><p>上述算法中的matrix化的表达方式。</p><p>预测打分： X*Theta^<br>找到类似的电影：补公式</p><span>$$\begin{align*}X = \begin{bmatrix} - &amp; (x^{(1)})^T &amp; - \\ &amp; \vdots &amp; \\ - &amp; (x^{(n_m)} &amp; - \end{bmatrix},\ \Theta = \begin{bmatrix} - &amp; (\theta^{(1)})^T &amp; - \\ &amp; \vdots &amp; \\ - &amp; (\theta^{(n_u)} &amp; - \end{bmatrix}\end{align*}$$</span><!-- Has MathJax --><span>$$\begin{align*}X \Theta^T\end{align*}$$</span><!-- Has MathJax --><h3 id="mean-normalization-的使用"><a class="markdownIt-Anchor" href="#mean-normalization-的使用"></a> mean normalization 的使用</h3><p>当用户从来没打分过，根据前面的算法，预估的打分会是全零。<br>这不合理。<br>昨晚mean normalization后，未知用户的打分就接近平均分了。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Density Estimation </tag>
            
            <tag> Gaussian Distribution </tag>
            
            <tag> Anomaly Detection Algorithm </tag>
            
            <tag> recommender system </tag>
            
            <tag> Collaborative filtering Algorithm </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>LaTex Editing</title>
      <link href="2018/01/21/markdown/Trending/MachineLearning/LaTexEditing/"/>
      <url>2018/01/21/markdown/Trending/MachineLearning/LaTexEditing/</url>
      
        <content type="html"><![CDATA[<h1 id="sample"><a class="markdownIt-Anchor" href="#sample"></a> Sample</h1><p><a href="https://www.tutorialspoint.com/online_latex_editor.php" target="_blank" rel="noopener">https://www.tutorialspoint.com/online_latex_editor.php</a></p><h1 id="online-testing"><a class="markdownIt-Anchor" href="#online-testing"></a> Online Testing</h1><p><a href="https://www.mathjax.org/#demo" target="_blank" rel="noopener">https://www.mathjax.org/#demo</a></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">\begin&#123;align*&#125;</span><br><span class="line">  here put the content that passed the testing</span><br><span class="line"></span><br><span class="line">\end&#123;align*&#125;</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> Latex </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 8</title>
      <link href="2018/01/20/markdown/Trending/MachineLearning/MachineLearning_8/"/>
      <url>2018/01/20/markdown/Trending/MachineLearning/MachineLearning_8/</url>
      
        <content type="html"><![CDATA[<h1 id="unsupervised-learning-overview"><a class="markdownIt-Anchor" href="#unsupervised-learning-overview"></a> Unsupervised learning overview</h1><p>栗子：</p><p>market segment； 社交网络； 机器群集； 天文数据</p><h2 id="k-means-算法"><a class="markdownIt-Anchor" href="#k-means-算法"></a> K-means 算法</h2><p>给一堆数据，请把它们分成K类。怎么做？</p><p>1） 随机得到K个点。</p><p>loop直到K的值不再变化{</p><p>2） 计算每个数据到K的点的距离，如果数据组i到K（j）的距离最小，就认为数据i属于K（j）这个组。<br>数据全部分类完之后，调整K个点的值，每个K直接赋值为当前被分到该类的数据的平均值。</p><p>}</p><h3 id="k-means的optimization-objective"><a class="markdownIt-Anchor" href="#k-means的optimization-objective"></a> K-means的Optimization Objective</h3><p>supervised ML有cost function； 这种unsupervised很难有正确答案来衡量。那么如何证明做得好不好呢？</p><p>K means最终的目标是： （（每个点到自己所属的参照点的距离）的平方）的和最小。</p><h3 id="如何random初始化cluster-centroid"><a class="markdownIt-Anchor" href="#如何random初始化cluster-centroid"></a> 如何random初始化Cluster Centroid</h3><p>idea： run k-means 很多次，每次都random initialize centoid，得到cluster后，算cost function，比较用最小cost的</p><p>这样可以避免local optism</p><h3 id="如何选择cluster-number"><a class="markdownIt-Anchor" href="#如何选择cluster-number"></a> 如何选择cluster number</h3><p>没有正确答案。</p><ul><li>Elbow Method：根据分组的不同，cost function随着cluster的数字变化（luckily会）呈现elbow的曲线（大多情况下没有这种图像）</li><li>根据需要来决定分组数目。 比如，T恤尺码分组（SML or xs SML xl）</li></ul><h3 id="使用k-means进行图像压缩的原理"><a class="markdownIt-Anchor" href="#使用k-means进行图像压缩的原理"></a> 使用k-means进行图像压缩的原理</h3><p>假设：原图是256色（每个pixel使用三组数据表示红黄蓝的梯度颜色，每组数据是8位二进制（0-256））。<br>目标：选择16色来代表原图（每个pixel使用三组数据，每组数据是4位二进制（0-16））。</p><p>使用k-means： 将原图的每个pixel看作一组数据，使用k means尝试将其分组到16色的分组中。 这里分组的目标确定了。<br>原先的每个pixel是256<em>3个feature, 新的目标映射每个pixel是16</em>3个feature。所以我们要尝试将数据分组为256/16=16组。</p><p>随机选择16个颜色（16*3）<br>计算每个pixel到这些颜色的距离，找出centroids，然后重新找k<br>直到k不再变化。<br>然后将原先像素点属于的k算出来取代原先的pixel</p><p>—》pixel个数不变，颜色维度变小。原图像（256<em>3</em>pixel数目），新的图像（16<em>3</em>pixel数目），压缩了16倍。</p><h1 id="motivation"><a class="markdownIt-Anchor" href="#motivation"></a> Motivation</h1><h2 id="数据压缩"><a class="markdownIt-Anchor" href="#数据压缩"></a> 数据压缩</h2><p>2D -》 1D （当数据都在一条线上）<br>3D -》 2D （当数据都在一个平面上）</p><p>数据visulization依赖于数据压缩，因为一般来说显示数据都是2D或者3D</p><h2 id="principal-component-analysis-pca"><a class="markdownIt-Anchor" href="#principal-component-analysis-pca"></a> Principal Component Analysis - PCA</h2><ul><li>Linner Regression和PCA的区别</li></ul><p>Linner Regression是找每个点到目标直线的沿着坐标轴的距离。PCA则是找每个点project到目标直线的距离。</p><p>PCA不是Linner Regression</p><h3 id="pca的算法"><a class="markdownIt-Anchor" href="#pca的算法"></a> PCA的算法</h3><ul><li>先处理数据： feature normalization 和 mean normalization</li><li>应用PCA算法将数据从n维降低到k维：</li></ul><p>先算sigma值，以下是如果x是一个vector，从1到n loop所有的vector的情况。</p><span>$$\begin{align*}Sigma=\frac{1}{m}\displaystyle\sum\limits_{i=1}^n (x ^\left(i\right))(x ^\left(i\right))^T\end{align*}$$</span><!-- Has MathJax --><p>当X来代表所有数据的时候，</p><span>$$\begin{align*}Sigma=\frac{1}{m}X^TX\end{align*}$$</span><!-- Has MathJax --><p>使用SVD (singlar value decomposition)算法，带入sigma。</p><span>$$\begin{align*}[u,s,v] = SVD (Sigma)\end{align*}$$</span><!-- Has MathJax --><p>返回的U是一个n<em>n的matrix。取前面k列，即得到n</em>k的matrix，这个matrix叫<span>$u_{reduce}$</span><!-- Has MathJax -->.</p><p>对于单个x， <span>$u_{reduce})^Tx$</span><!-- Has MathJax -->即可得到reduce的z。<br>对于一组X数据， 则直接进行矩阵求值：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">U_reduce = U(:,1:K);</span><br><span class="line">Z = X * U_reduce;</span><br></pre></td></tr></table></figure><h3 id="从压缩的数据反向得到原始数据"><a class="markdownIt-Anchor" href="#从压缩的数据反向得到原始数据"></a> 从压缩的数据反向得到原始数据</h3><p>n<em>k的<span>$u_{reduce}$</span><!-- Has MathJax --> 乘以 k</em>1 的z可以得到大致的最初n*1的x vector</p><span>$$\begin{align*}x_{approx}=u_{reduce}*z\end{align*}$$</span><!-- Has MathJax --><h3 id="如何选择压缩维度k"><a class="markdownIt-Anchor" href="#如何选择压缩维度k"></a> 如何选择压缩维度k</h3><p>“99% of variance is retained”</p><p>这句话的意思是，经过压缩后，</p><span>$$\begin{align*}\frac{  \frac{1}{m}\displaystyle\sum\limits_{i=1}^m \lVert x ^\left(i\right)-x_{approx} ^\left(i\right)\rVert ^2} {\frac{1}{m}\displaystyle\sum\limits_{i=1}^m \lVert x ^\left(i\right)\rVert ^2} \leqslant 0.01\end{align*}$$</span><!-- Has MathJax --><p>根据这个参数指导寻找最优的k值</p><ul><li>Option1， k=1开始，逐渐放大k，直到上述的公式成立</li><li>Option2， 使用SVD (singlar value decomposition)算法，返回值的s是一个n*n的diagonal matrix （除了左上到右下的对角线，其它部分为0）。</li></ul><span>$$\begin{align*}[u,s,v] = SVD (Sigma)\end{align*}$$</span><!-- Has MathJax --><p>根据SVD的结果判断以下是否成立，如果不成立，则k不满足要求。</p><span>$$\begin{align*}\frac{\sum\limits_{i=1}^k s_{ii}}{ \sum\limits_{i=1}^n s_{ii} }\geq0.99\end{align*}$$</span><!-- Has MathJax --><ul><li>PCA 算法使用时应该只用于training data set，而不是cv或者test data set。</li></ul><h4 id="pca的意义"><a class="markdownIt-Anchor" href="#pca的意义"></a> PCA的意义</h4><ul><li>压缩数据，减少存储，加速算法</li><li>压缩数据，方便visualization展示数据（人类肉眼智能观察最多3D的数据）</li></ul><p>错误的使用场景：</p><ul><li>错误1: 使用PCA用来做防止overfitting，但是很容易丢失数据中的有用信息； instead还是使用regularization才是正确的做法。</li><li>错误2: PCA作为默认步骤处理数据。只有正常的分析不work时候，才考虑使用PCA进行数据压缩。</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Unsupervised Learning </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 7</title>
      <link href="2018/01/13/markdown/Trending/MachineLearning/MachineLearning_7/"/>
      <url>2018/01/13/markdown/Trending/MachineLearning/MachineLearning_7/</url>
      
        <content type="html"><![CDATA[<h1 id="svm"><a class="markdownIt-Anchor" href="#svm"></a> SVM</h1><h1 id="large-margin-classifiers"><a class="markdownIt-Anchor" href="#large-margin-classifiers"></a> Large Margin Classifiers</h1><p>SVM 又叫Large Margin Classifier。</p><p>概念：<br>Margin of SVM<br>SVM叫Large Margin Classifier因为这种算法找出来的参数对数据进行区分的时候会找到最大的Margin处才划线。</p><p>C跟lambda的值的意义是相反的。C约等于1/lambda。所以，C取很大的值相当于lambda取很小的值，这时候，模型会尽量fit数据（可能会overfit）。反之，C取很小的值的时候，类似于我们把lambda取很大的值一样， 这时候，数据有混合的时候，模型参数会进行忽略那些极个别的数据。</p><h1 id="kernel"><a class="markdownIt-Anchor" href="#kernel"></a> Kernel</h1><p>Kernel is a similarity function. 在坐标图上， 它 体现的是一个点到周边feature点的距离比例（结果是0-1， 0代表相似度0，或者说很远，1代表很相似）。<br>课程中使用的function是Kernel function的一种，叫做高斯kernel（Gaussian Kernel）</p><p>SVM 中的C 和 sigma</p><p>C 跟lambda相反; 一个大lambda是防止overfit的，那么一个小的C也是一样的效果。所以C越小越容易overfit，越viriant，C越大越容易bias<br>如果采用高斯kernel，则需要选择sigma</p><p>SVM也是用结果logistic regression一样的问题。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> SVM </tag>
            
            <tag> Large Margin Classifiers </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 6</title>
      <link href="2018/01/09/markdown/Trending/MachineLearning/MachineLearning_6/"/>
      <url>2018/01/09/markdown/Trending/MachineLearning/MachineLearning_6/</url>
      
        <content type="html"><![CDATA[<h1 id="what-do-to-next"><a class="markdownIt-Anchor" href="#what-do-to-next"></a> What do to next</h1><p>如果当前的建模不够好，误差很大，怎么办？有以下Options</p><ul><li>收集更多训练数据</li><li>减少无关feature</li><li>增加相关feature</li><li>组合，变形，使用Polynomial feature</li><li>减小lambda</li><li>增大lambda</li></ul><h2 id="如何评估模型"><a class="markdownIt-Anchor" href="#如何评估模型"></a> 如何评估模型？</h2><h3 id="数据七三开"><a class="markdownIt-Anchor" href="#数据七三开"></a> 数据七三开</h3><p>把数据七三开（随机筛选分组）。使用七成数据训练参数，三成数据评估。</p><p>对于Liner Regression来说，评估使用的公式是：</p><span>$J_{test}(\Theta) = \dfrac{1}{2m_{test}} \sum_{i=1}^{m_{test}}(h_\Theta(x^{(i)}_{test}) - y^{(i)}_{test})^2$</span><!-- Has MathJax --><p>对于Logical Regression来说，单组数据的错误是这样评估的：</p><span>$err(h_\Theta(x),y) = \begin{matrix} 1 &amp; \mbox{if } h_\Theta(x) \geq 0.5\ and\ y = 0\ or\ h_\Theta(x) &lt; 0.5\ and\ y = 1\newline 0 &amp; \mbox otherwise \end{matrix}$</span><!-- Has MathJax --><p>那么对于多组评估数据来说，其评估公式为：</p><span>$\text{Test Error} = \dfrac{1}{m_{test}} \sum^{m_{test}}_{i=1} err(h_\Theta(x^{(i)}_{test}), y^{(i)}_{test})$</span><!-- Has MathJax --><p>其中心思想是：<br>对于随机分组的数据来说，正确的算法的cost function表现应该一致且小。 如果表现不一致说明可能over fitting了，如果不够小，说明模型不契合。</p><h3 id="更进一步六二二开"><a class="markdownIt-Anchor" href="#更进一步六二二开"></a> 更进一步六二二开</h3><p>中心思想，数据随机分组。<br>Training set用来训练参数。（训练多个不同的模型，每个模型都得到一组参数）<br>Cross Validation set用来选择Polynomial模型（选最小的）<br>Test Set用来验证模型。（用选择的Polynomial degree的模型测试cost function还是很小）。</p><h3 id="一些概念"><a class="markdownIt-Anchor" href="#一些概念"></a> 一些概念</h3><p>High bias (underfitting): both Jtrain(Θ) and JCV(Θ) will be high. Also, JCV(Θ)≈Jtrain(Θ).</p><p>High variance (overfitting): Jtrain(Θ) will be low and JCV(Θ) will be much greater than Jtrain(Θ).</p><h2 id="如何结合使用选择lambda"><a class="markdownIt-Anchor" href="#如何结合使用选择lambda"></a> 如何结合使用选择lambda</h2><p>思路：</p><p>选择从小到达的lambda（例子从0.1-》10），针对特定模型，train完后，不用lambda使用cross validation训练数据算cost，cost最小的lambda最合理。</p><h2 id="learning-curve"><a class="markdownIt-Anchor" href="#learning-curve"></a> Learning Curve</h2><p>使用training set的大小作为横轴，衡量traning set和CV set的表现。</p><p>正确的模型，training set和cross verification set的cost应该稳定的随着数据的增大而减少。</p><p>high bias的模型，数据越多，train set的cost不会减少，而是趋近一个较大的偏离值。<br>hign variation的模型，数据越多，train set的cost随着数据增多而变大稳定，但是cv set的值偏离较大。</p><h2 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h2><p>Our decision process can be broken down as follows:</p><ul><li>Getting more training examples: Fixes high variance</li><li>Trying smaller sets of features: Fixes high variance</li><li>Adding features: Fixes high bias</li><li>Adding polynomial features: Fixes high bias</li><li>Decreasing λ: Fixes high bias</li><li>Increasing λ: Fixes high variance.</li></ul><p>Diagnosing Neural Networks</p><ul><li>A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.<br>A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.</li><li>Using a single hidden layer is a good starting default. You can train your neural network on a number of hidden layers using your cross validation set. You can then select the one that performs best.</li></ul><p>Model Complexity Effects:</p><ul><li><p>Lower-order polynomials (low model complexity) have high bias and low variance. In this case, the model fits poorly consistently.</p></li><li><p>Higher-order polynomials (high model complexity) fit the training data extremely well and the test data extremely poorly. These have low bias on the training data, but very high variance.</p></li></ul><p>In reality, we would want to choose a model somewhere in between, that can generalize well but also fits the data reasonably well.</p><h1 id="下一步的工作"><a class="markdownIt-Anchor" href="#下一步的工作"></a> 下一步的工作</h1><p>举个邮件spam的例子：</p><ul><li>Honeypot project --》 负面的例子。 为了解决邮件spam的问题，突发奇想注册专门的邮箱收集大量spam邮件来进行数据分析。虽然spam问题确实需要大量数据，但是这样做并不能保证成功。</li><li>通常spam分析会有1w到5w的word作为feature，可能会通过邮件分析top常用词的方式。</li><li>很复杂的情况会出现：区分大小写吗？区分单复数吗？错误的拼写如何应对？（w4tch，代表watch，但是正常不会被算作过滤词，因为拼写错误）</li><li>加入复杂的feature，对于邮件的header区域的信息进行分析。</li></ul><h2 id="error-analysis"><a class="markdownIt-Anchor" href="#error-analysis"></a> Error Analysis</h2><ul><li>算法从简单的开始，迅速实现，使用cv数据测试，验证</li><li>观察learning curve决定是否需要更多数据？ 更多feature？</li><li>Error Analysis — 以spam为例，把算法归类错的训练数据拿出来进行分析。<ul><li>对错误进行分类，比如，如果fishing email这块做得最差，那么改进算法应该focus在这一块</li><li>进行numerical evaluation。加入steming和不用stemming；区分大小写和不区分大小写； 哪种方式效果好？</li></ul></li></ul><h2 id="处理skewed-data"><a class="markdownIt-Anchor" href="#处理skewed-data"></a> 处理skewed data</h2><p>什么是skewed data？<br>典型场景： 癌症分类算法，实际数据0.5%的总人数是positive，算法error比例是1%。这种情况下，hardcode result=0的error比例将为是0.5%，这个结果看起来都会比算法的error比例1%小。那么怎么正确衡量算法的准确度呢？</p><p>这里引入两个概念：Pricision和recall<br>在算法中，rare case的情况我们标示为1.<br>Pricision：在所有算法预测结果为1的训练数据中，有多少是实际是1？<br>Recall： 在所有实际是1的训练数据中，有多少被算法预测为1？</p><p>上面这两个比例越高说明算法准确度越高。Cheating的算法是无法通过Pricision和Recall的衡量测试的。<br>例如，上面的cheating算法，recall为0.</p><h3 id="trading-off-precision和recall"><a class="markdownIt-Anchor" href="#trading-off-precision和recall"></a> trading off precision和recall</h3><p>实际算法比较中，如果我们调整threshold（阀值），我们可以看到precision和recall之间的关系。<br>比如，比例》0.9才算positive，那么precision会很高，但是recall就会变低。反之亦然。</p><p>这样的情况，如果有多个算法，recall高的precision低，我们如何比较算法的准确率呢？<br>首先，（recall+precision）/2 的值用来比较是绝对不行的，参见上面hardcode result=0的算法，那个算法使用平均值衡量的话，会可能脱颖而出。</p><p>需要一种算法能够综合考虑precision和recall的表现，这种情况有多种公式可以处理，本课程引入F1的算法。<br>F1 = 2* P*R（P+R）</p><p>基本上就是P和R表现的平均好的才会脱颖而出。</p><h2 id="使用大量数据"><a class="markdownIt-Anchor" href="#使用大量数据"></a> 使用大量数据</h2><p>前面讲过，算法不对，数据多也是白搭。<br>但是对于一些算法，比如AI词汇填空，实际证明了，数据越多，算法越精准。那么怎么衡量数据多好不好呢？</p><p>提问： 如果你有这些数据，对于human expert来说，他能不能精准的给出正确答案？if yes， go ahead， 收集越多数据，模型会越精准。if no，说明再多数据也没用。</p><p>极端例子：比如只收集的房子的面积数据，如果问地产中介，他也给不出估价。这种情况就说明，数据多了也没用。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Machine Learning Methodology </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 5</title>
      <link href="2017/12/30/markdown/Trending/MachineLearning/MachineLearning_5/"/>
      <url>2017/12/30/markdown/Trending/MachineLearning/MachineLearning_5/</url>
      
        <content type="html"><![CDATA[<h1 id="neron-network的cost-function"><a class="markdownIt-Anchor" href="#neron-network的cost-function"></a> Neron Network的cost function</h1><p>Neron Network的cost function是logic regression的加强版。</p><p>先熟悉一些term</p><p>L = network的总层数</p><span>$s_l$</span><!-- Has MathJax --> = 层l的unit个数（不包含bias unit）<p>K = 最后一层的unit个数（其实反映的是classes的个数）</p><ul><li><p>k&gt;=3的话，根据最初的模型，也就说明一组待预测数据进去，出来的最终结果是一个vector，这表明最终结果反映的是待预测的输入属于每个class分类可能性的大小（比如图片分类）。（k=1反映的是一组数据进去，最终结果是0或者1，一个unit即可表达）</p></li><li><p>Cost function for regularized logistic regression</p></li></ul><span>$J(\theta) = - \frac{1}{m} \sum_{i=1}^m [ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$</span><!-- Has MathJax --><ul><li>Cost function for neural networks</li></ul><span>$\begin{gather*} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gather*}$</span><!-- Has MathJax --><h1 id="neron-network的求值算法-back-propagation-algorithm"><a class="markdownIt-Anchor" href="#neron-network的求值算法-back-propagation-algorithm"></a> Neron Network的求值算法 Back Propagation Algorithm</h1><p>给定一组训练数据如下：</p><span>$\lbrace (x^{(1)}, y^{(1)}) \cdots (x^{(m)}, y^{(m)})\rbrace$</span><!-- Has MathJax --><p>初始化：</p><span>$\Delta^{(l)}_{i,j}$</span><!-- Has MathJax --><p>Loop开始（pick一组训练数据）</p><hr><ul><li>第一步，根据模型，使用 forward propagation 求出当前训练数据的预测值</li><li>从<span>$\delta^{(L)} = a^{(L)} - y^{(t)}$</span><!-- Has MathJax -->开始，使用Back propagation 算出所有的<span>$\delta$</span><!-- Has MathJax --></li></ul><p>其中，算法公式为：</p><span>$\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)})$</span><!-- Has MathJax --><ul><li>第二步，计算每一层的delta</li></ul><span>$g&apos;(z^{(l)}) = a^{(l)}\ .*\ (1 - a^{(l)})$</span><!-- Has MathJax --><p>注意在一些表达中出现了如上公式，叫做g-prime derivative terms。</p><span>$\Delta^{(l)}_{i,j} := \Delta^{(l)}_{i,j} + a_j^{(l)} \delta_i^{(l+1)}$</span><!-- Has MathJax --><p>写为</p><span>$\Delta^{(l)} := \Delta^{(l)} + \delta^{(l+1)}(a^{(l)})^T$</span><!-- Has MathJax --><ul><li>第三步，累计每一层的大delta，从而得出overall的partial derivative公式。</li></ul><p>j不等于0时：</p><span>$D^{(l)}_{i,j} := \dfrac{1}{m}\left(\Delta^{(l)}_{i,j} + \lambda\Theta^{(l)}_{i,j}\right)$</span><!-- Has MathJax --><p>j等于0时：</p><span>$D^{(l)}_{i,j} := \dfrac{1}{m}\Delta^{(l)}_{i,j}$</span><!-- Has MathJax --><p>最后得出Partial Derivative是：</p><span>$\frac \partial {\partial \Theta_{ij}^{(l)}} J(\Theta)$</span><!-- Has MathJax --><hr><p>Loop结束</p><p>back propagation实际在做什么？</p><p>实际上是从结果的delta倒推每一层每个参数的delta。<br>每个单元的delta其实是用当前单元向前propagate时候贡献过的单元的delta和贡献时候使用的theta值结合起来算出来的。</p><h2 id="算法实现细节"><a class="markdownIt-Anchor" href="#算法实现细节"></a> 算法实现细节</h2><h3 id="rolling-parameters"><a class="markdownIt-Anchor" href="#rolling-parameters"></a> Rolling parameters</h3><p>为了使用跟之前一样的方式调用fminunc()求最优的theta，不方便传入多维数组。<br>这里引入unroll 的parameter的概念。因为对于神经网络模型，把多维theta matrix给unroll成一个大vector并不影响计算。算出来的结果再reshape回来即可。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]</span><br><span class="line">deltaVector = [ D1(:); D2(:); D3(:) ]</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Theta1 = reshape(thetaVector(1:110),10,11)</span><br><span class="line">Theta2 = reshape(thetaVector(111:220),10,11)</span><br><span class="line">Theta3 = reshape(thetaVector(221:231),1,11)</span><br></pre></td></tr></table></figure><p>总结：<br>unroll theta的matrix得到大vector调用fminunc求theta<br>reshape得到theta用来计算D 和 J(theta)<br>unroll D得到DVector用来计算gradientVector（？？？）</p><h3 id="gradient-checking"><a class="markdownIt-Anchor" href="#gradient-checking"></a> Gradient Checking</h3><p>在使用fp 和 bp的过程中用来检查实现是否正确。</p><p>公式如下：</p><span>$\dfrac{\partial}{\partial\Theta}J(\Theta) \approx \dfrac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}$</span><!-- Has MathJax --><p>对于多维theta的神经网络，该公式表示为（即为每个theta求一次numerical derivative ,注意这样的验证超级消耗计算资源，所以该算法值用来验证算法BP实现的的正确性，一旦验证就不需要反复调用该公式做验证。</p><span>$\dfrac{\partial}{\partial\Theta_j}J(\Theta) \approx \dfrac{J(\Theta_1, \dots, \Theta_j + \epsilon, \dots, \Theta_n) - J(\Theta_1, \dots, \Theta_j - \epsilon, \dots, \Theta_n)}{2\epsilon}$</span><!-- Has MathJax --><p>其中</p><span>${\epsilon = 10^{-4}}$</span><!-- Has MathJax --><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">epsilon = 1e-4;</span><br><span class="line">for i = 1:n,</span><br><span class="line">  thetaPlus = theta;</span><br><span class="line">  thetaPlus(i) += epsilon;</span><br><span class="line">  thetaMinus = theta;</span><br><span class="line">  thetaMinus(i) -= epsilon;</span><br><span class="line">  gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)</span><br><span class="line">end;</span><br></pre></td></tr></table></figure><h3 id="random-initiation"><a class="markdownIt-Anchor" href="#random-initiation"></a> Random Initiation</h3><p>theta初始值设为全0，对于logical gradient 算法管用，但是对于neron network不行，最终会形成死循环但找不到合理参数。<br>我们需要做random initiation， 而且必须打破symmetry （symmetry breaking），最后得到的随机数参数应该在正负EPSILON之间。</p><p>If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.(这里的INIT_EPSILON是Ocatave保留常量)</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;</span><br><span class="line">Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;</span><br><span class="line">Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;</span><br></pre></td></tr></table></figure><h1 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h1><h2 id="选择neron-network的architecture"><a class="markdownIt-Anchor" href="#选择neron-network的architecture"></a> 选择Neron Network的architecture</h2><ol><li>input个数根据feature个数选择</li><li>output个数根据classification个数选择</li><li>默认一层hidden layer，当然越多越好，但是运算量越大</li><li>每层hidden layer的unit数选择相同个数。</li></ol><h2 id="train-neron-network"><a class="markdownIt-Anchor" href="#train-neron-network"></a> Train Neron Network</h2><ol><li><p>Randomly initialize the weights</p></li><li><p>Implement forward propagation to get hΘ(x(i)) for any x(i)</p></li><li><p>Implement the cost function</p></li><li><p>Implement backpropagation to compute partial derivatives</p></li><li><p>Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.</p></li><li><p>Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.</p></li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">for i = 1:m,</span><br><span class="line">   Perform forward propagation and backpropagation using example (x(i),y(i))</span><br><span class="line">   (Get activations a(l) and delta terms d(l) for l = 2,...,L</span><br></pre></td></tr></table></figure><p>Keep in mind that J(Θ) is not convex and thus we can end up in a local minimum instead.</p><h1 id="neron-network-的应用案例"><a class="markdownIt-Anchor" href="#neron-network-的应用案例"></a> Neron Network 的应用案例</h1><p>自动驾驶，<br>3层网络，2分钟就学会了驾驶员对图像的判断处理</p><p>怎么判断？</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Neron Network </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 4</title>
      <link href="2017/12/26/markdown/Trending/MachineLearning/MachineLearning_4/"/>
      <url>2017/12/26/markdown/Trending/MachineLearning/MachineLearning_4/</url>
      
        <content type="html"><![CDATA[<h1 id="non-liner-overview"><a class="markdownIt-Anchor" href="#non-liner-overview"></a> Non-liner overview</h1><p>图像处理的例子。像素50*50的时候，每个像素点作为一个feature，</p><ul><li>黑白图片的话，会一共有50×50个feature</li><li>彩色图片的话，红黄蓝三色分开算，共有50<em>50</em>3个feature</li></ul><p>如果采用quadratic features的方式建模(即每个参数均由任意两个feature相乘得来)，xi* xj，</p><p>n个数，两两排列组合，可能性是n*n/2，所以feature变成约 3 million</p><p>** 结论：Non-liner的问题，建模遇到的问题往往是feature太多。所以我们引入神经网络，会帮助大大简化问题。 **</p><h2 id="neron-and-brain"><a class="markdownIt-Anchor" href="#neron-and-brain"></a> Neron and brain</h2><p>科学家发现，大脑的区域经过信号的训练可以对相应的信息进行处理。比如把视觉神经搭到听觉区域，那么听觉区域通过长期的信号训练可以发展出对视觉信息的处理能力。</p><p>因此，类似实验有：</p><ul><li>把图像信号从舌头输入训练触觉区的神经元来处理图像。</li><li>把声音信号输入用来训练神经元进行处理回声定位处理。（例子，没有眼球的小孩儿通过回声判断周围情况）</li><li>用腰带信号来定位北方，训练人类神经元具备和鸟类一样的方向感。</li><li>给青蛙植入一个额外的眼睛，并训练神经元学会使用额外的眼睛。</li></ul><h3 id="neron-network"><a class="markdownIt-Anchor" href="#neron-network"></a> Neron network</h3><p>At a very simple level, neurons are basically computational units that take <strong>inputs (dendrites)</strong> as electrical inputs <strong>(called “spikes”)</strong> that are channeled to outputs (axons). In our model, our dendrites are like the input features x1⋯xn, and the output is the result of our hypothesis function. In this model our <strong>x0 input node is sometimes called the “bias unit.”</strong> It is always equal to 1. In neural networks, we use the same logistic function as in classification, 11+e−θTx, yet we sometimes call it a <strong>sigmoid (logistic) activation function</strong>. In this situation, our “theta” parameters are sometimes called “weights”.</p><p>基本的神经元表示： 若干输入，根据算法，得到输出。</p><span>$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline \end{bmatrix}\rightarrow\begin{bmatrix}\ \ \ \newline \end{bmatrix}\rightarrow h_\theta(x)$</span><!-- Has MathJax --><p>每一层的神经元和参数表示为：</p><span>$\begin{align*}&amp; a_i^{(j)} = \text{&quot;activation&quot; of unit $i$ in layer $j$} \newline&amp; \Theta^{(j)} = \text{matrix of weights controlling function mapping from layer $j$ to layer $j+1$}\end{align*}$</span><!-- Has MathJax --><p>例如，一个神经网络表示为：（最左边的一层是input layer，最右边是output layer，中间是hidden layer）。</p><span>$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline x_3\end{bmatrix}\rightarrow\begin{bmatrix}a_1^{(2)} \newline a_2^{(2)} \newline a_3^{(2)} \newline \end{bmatrix}\rightarrow h_\theta(x)$</span><!-- Has MathJax --><p>每一层的神经元求值公式是：</p><span>$\begin{align*} a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \newline a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3) \newline a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \newline h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{align*}$</span><!-- Has MathJax --><span>$\text{If network has $s_j$ units in layer $j$ and $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j + 1)$.}$</span><!-- Has MathJax --><p>想象一下多层的网状结构，每一层都有一个隐含变量，所以当前层用前一层求值的时候，参数永远都是一个Matrix，行数等于当前层的单元数，列数等于（前一层变量数+1)。</p><h3 id="neron-network的简化表达"><a class="markdownIt-Anchor" href="#neron-network的简化表达"></a> Neron network的简化表达</h3><p>算法：<br>假定：</p><span>$z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$</span><!-- Has MathJax --><p>那么：</p><span>$a^{(j)} = g(z^{(j)})$</span><!-- Has MathJax --><h3 id="neron-network-in-simple-action"><a class="markdownIt-Anchor" href="#neron-network-in-simple-action"></a> Neron network in simple action</h3><p>以下是使用Neron Network建模的AND function（参数已经预先fit in，需要记住g(4)=0.99 ； g(-4.6)=0.01):</p><span>$\begin{align*}&amp; h_\Theta(x) = g(-30 + 20x_1 + 20x_2) \newline \newline &amp; x_1 = 0 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-30) \approx 0 \newline &amp; x_1 = 0 \ \ and \ \ x_2 = 1 \ \ then \ \ g(-10) \approx 0 \newline &amp; x_1 = 1 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-10) \approx 0 \newline &amp; x_1 = 1 \ \ and \ \ x_2 = 1 \ \ then \ \ g(10) \approx 1\end{align*}$</span><!-- Has MathJax --><h3 id="neron-network-in-simple-action-ii"><a class="markdownIt-Anchor" href="#neron-network-in-simple-action-ii"></a> Neron network in simple action II</h3><p>XNOR使用2层Neron network构建。<br>（结合之前一层network的结果得出network的结构）。</p><span>$\Theta^{(1)} =\begin{bmatrix}-30 &amp; 20 &amp; 20 \newline 10 &amp; -20 &amp; -20\end{bmatrix}$</span><!-- Has MathJax --><span>$\begin{align*}&amp; a^{(2)} = g(\Theta^{(1)} \cdot x) \newline&amp; a^{(3)} = g(\Theta^{(2)} \cdot a^{(2)}) \newline&amp; h_\Theta(x) = a^{(3)}\end{align*}$</span><!-- Has MathJax --><h2 id="结果是multiclass的neron-network"><a class="markdownIt-Anchor" href="#结果是multiclass的neron-network"></a> 结果是multiclass的Neron Network</h2><p>前面的例子中，最终的output是0或者1，是单个结果。如果结果是multiclass呢？<br>比如：识别手写数字的例子中，结果是0-9 一共10种可能，这种情况下，最终的output不是一维的，而是多维的。</p><p>举个栗子，这是一个四个class（multiclass)的单个结果的例子：</p><span>$h_\Theta(x) =\begin{bmatrix}0 \newline 0 \newline 1 \newline 0 \newline\end{bmatrix}$</span><!-- Has MathJax --><p>所以，对于这种模型，上一层theta参数的个数也要再乘以单个结果vector维度。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Neron Network </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Octave</title>
      <link href="2017/12/26/markdown/Trending/MachineLearning/Octave/"/>
      <url>2017/12/26/markdown/Trending/MachineLearning/Octave/</url>
      
        <content type="html"><![CDATA[<h1 id="octave-on-windows"><a class="markdownIt-Anchor" href="#octave-on-windows"></a> Octave on windows</h1><p>Run bin/octave-cli.exe</p><p>Add path</p><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">addpath('D:\workspaces\bitbucket\machinelearninghomework\machine-learning-ex2\ex2')</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">if (rem (x, 2) == 0)</span><br><span class="line">  printf (&quot;x is even\n&quot;);</span><br><span class="line">else</span><br><span class="line">  printf (&quot;x is odd\n&quot;);</span><br><span class="line">endif</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> Octave </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 3</title>
      <link href="2017/12/24/markdown/Trending/MachineLearning/MachineLearning_3/"/>
      <url>2017/12/24/markdown/Trending/MachineLearning/MachineLearning_3/</url>
      
        <content type="html"><![CDATA[<p>为什么我们不能用liner regression的建模方法在classification的问题中？<br>1） 结果和训练集都是离散的<br>2） 结果是0或者1，liner regression的模型很难fit in这样的曲线。</p><h1 id="对classification的问题进行数学建模"><a class="markdownIt-Anchor" href="#对classification的问题进行数学建模"></a> 对classification的问题进行数学建模</h1><p>引入概念g(z)：<br>Sigmoid function = logistic function<br>这个方程的图形决定了其区间在0和1之间</p><p><strong>注意这里的theta和x都是单个的vector（建模使用）。当转换为多维时，每组x是一行数据，而不再是一列数据，所以其运算表达方式不一样。</strong></p><span>$\begin{align*}&amp; h_\theta (x) = g ( \theta^T x ) \newline \newline&amp; z = \theta^T x \newline&amp; g(z) = \dfrac{1}{1 + e^{-z}}\end{align*}$</span><!-- Has MathJax --><p>其中g(z)就是著名的sigmoid function</p><p>模型引申的公理：</p><span>$\begin{align*}&amp; h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \newline&amp; P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{align*}$</span><!-- Has MathJax --><h2 id="decision-boundary"><a class="markdownIt-Anchor" href="#decision-boundary"></a> Decision Boundary</h2><p>对于classification的数学建模，一旦模型选定了，我们在猜测一组参数后，就可以确定一条Decision boundary来区分我们的数据从而进行判断。</p><p>首先假定我们这么定义建模后的判断规则：</p><span>$\begin{align*}&amp; h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline&amp; h_\theta(x) &lt; 0.5 \rightarrow y = 0 \newline\end{align*}$</span><!-- Has MathJax --><p>对原始公式进行推导，可以得出：</p><span>$\begin{align*}&amp; \theta^T x \geq 0 \Rightarrow y = 1 \newline&amp; \theta^T x &lt; 0 \Rightarrow y = 0 \newline\end{align*}$</span><!-- Has MathJax --><p>Decision Boundary的概念不光可以用于linear的建模，也适用于多次方程（polynomial function， non-liner）的建模。</p><h2 id="cost-function-for-logistic-regression-model"><a class="markdownIt-Anchor" href="#cost-function-for-logistic-regression-model"></a> Cost Function for logistic regression model</h2><p>首先尝试使用Liner regression类似的模型，发现图形不convex。于是做出改变，以符合实际情况。</p><span>$\begin{align*}&amp; J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline &amp; \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; &amp; \text{if y = 1} \newline &amp; \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; &amp; \text{if y = 0}\end{align*}$</span><!-- Has MathJax --><p>对它的解释和图形理解参见如下：</p><p>如果算出来的结果和实际刚好相反，那么cost为无穷大。这样就可以修正模型参数。</p><span>$\begin{align*}&amp; \mathrm{Cost}(h_\theta(x),y) = 0 \text{ if } h_\theta(x) = y \newline &amp; \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 0 \; \mathrm{and} \; h_\theta(x) \rightarrow 1 \newline &amp; \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 1 \; \mathrm{and} \; h_\theta(x) \rightarrow 0 \newline \end{align*}$</span><!-- Has MathJax --><p>一个简化版的cost 写法：</p><span>$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$</span><!-- Has MathJax --><p>于是，简化版cost function：</p><span>$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$</span><!-- Has MathJax --><p>用Matrix方式表示：</p><p><strong>注意这里的h等同于多维情况下的模型。（已手工演算）在后面的习题中也有类似问题，求单个结果值和求一组结果，theta和x的表达往往是不一样的。</strong></p><span>$\begin{align*} &amp; h = g(X\theta)\newline &amp; J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right) \end{align*}$</span><!-- Has MathJax --><h2 id="根据cost-function来确定logistic-regression模型的gradient-descent-算法"><a class="markdownIt-Anchor" href="#根据cost-function来确定logistic-regression模型的gradient-descent-算法"></a> 根据Cost Function来确定logistic regression模型的Gradient Descent 算法</h2><span>$\begin{align*} &amp; Repeat \; \lbrace \newline &amp; \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline &amp; \rbrace \end{align*}$</span><!-- Has MathJax --><p>使用Matrix表示（vectorized implementation）：</p><span>$\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})$</span><!-- Has MathJax --><h2 id="比gradient-decent更高级的算法"><a class="markdownIt-Anchor" href="#比gradient-decent更高级的算法"></a> 比Gradient Decent更高级的算法</h2><p>“Conjugate gradient”, “BFGS”, and “L-BFGS” 。<br>优点： 不需要提供learning rate（自适应）；快<br>缺点： 复杂，一般人只能做api caller</p><p>调用方法：</p><p>先实现模型的cost算法 和 Gradient Decent中用的derivation的算法。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">function [jVal, gradient] = costFunction(theta)</span><br><span class="line">  jVal = [...code to compute J(theta)...];</span><br><span class="line">  gradient = [...code to compute derivative of J(theta)...];</span><br><span class="line">end</span><br></pre></td></tr></table></figure><p>其中gradient对应的是以下部分的值（<a href="https://en.wikipedia.org/wiki/Partial_derivative%EF%BC%89" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/Partial_derivative）</a></p><span>$\frac{1}{m} X^{T} (g(X \theta ) - \vec{y})$</span><!-- Has MathJax --><p>作为参数告诉高级算法。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">options = optimset(&apos;GradObj&apos;, &apos;on&apos;, &apos;MaxIter&apos;, 100);</span><br><span class="line">initialTheta = zeros(2,1);</span><br><span class="line">   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);</span><br></pre></td></tr></table></figure><h2 id="multiclass-classification"><a class="markdownIt-Anchor" href="#multiclass-classification"></a> MultiClass classification</h2><p>one-vs-rest<br>用同样的算法，每次求参数的时候，把当前class的剩余类别看成一个虚拟同类。然后求出当前class的参数。</p><p>数学表示：</p><span>$\begin{align*}&amp; y \in \lbrace0, 1 ... n\rbrace \newline&amp; h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline&amp; h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline&amp; \cdots \newline&amp; h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline&amp; \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$</span><!-- Has MathJax --><h1 id="solving-over-fitting-problem"><a class="markdownIt-Anchor" href="#solving-over-fitting-problem"></a> Solving “Over fitting” problem</h1><p>under fit = high bias 是用来形容不fit的模型。<br>over fit = high variance 用来形容太fit训练数据但模型曲线奇怪。</p><p>如何解决？（思路）</p><ul><li>1） 减少feature （手动，算法自动）</li><li>2） Regularization : 参数变小， 小到极限接近于0图形就会变简单减少curve</li></ul><h2 id="用来帮助regularization并解决over-fitting的特殊cost-function"><a class="markdownIt-Anchor" href="#用来帮助regularization并解决over-fitting的特殊cost-function"></a> 用来帮助Regularization并解决&quot;over fitting&quot;的特殊Cost Function</h2><span>$min_\theta\ \dfrac{1}{2m}\  \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2$</span><!-- Has MathJax --><p>如果<span>$\lambda\$</span><!-- Has MathJax --> 选的太大，会造成选出来的参数完全不符合训练集。<br>太小就失去了Regularization的作用（？？）</p><h3 id="apply-regularization的liner-regression模型求参算法"><a class="markdownIt-Anchor" href="#apply-regularization的liner-regression模型求参算法"></a> apply Regularization的liner regression模型求参算法</h3><p>使用以下改进过的算法，会使得模型curve尽量简单。</p><h4 id="gradient-decent的改进"><a class="markdownIt-Anchor" href="#gradient-decent的改进"></a> Gradient Decent的改进</h4><span>$\begin{align*} &amp; \text{Repeat}\ \lbrace \newline &amp; \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline &amp; \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &amp;\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline &amp; \rbrace \end{align*}$</span><!-- Has MathJax --><h4 id="normal-equation的改进"><a class="markdownIt-Anchor" href="#normal-equation的改进"></a> Normal Equation的改进</h4><span>$\begin{align*}&amp; \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline&amp; \text{where}\ \ L = \begin{bmatrix} 0 &amp; &amp; &amp; &amp; \newline &amp; 1 &amp; &amp; &amp; \newline &amp; &amp; 1 &amp; &amp; \newline &amp; &amp; &amp; \ddots &amp; \newline &amp; &amp; &amp; &amp; 1 \newline\end{bmatrix}\end{align*}$</span><!-- Has MathJax --><h3 id="apply-regularization的logistic-regression模型求参算法"><a class="markdownIt-Anchor" href="#apply-regularization的logistic-regression模型求参算法"></a> apply Regularization的logistic regression模型求参算法</h3><p>Logistic regression模型的Gradient Descent算法的改进：</p><span>$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$</span><!-- Has MathJax --><h3 id="advance-optimization"><a class="markdownIt-Anchor" href="#advance-optimization"></a> Advance Optimization</h3><p>Gradient Descent高级进阶版&quot;Conjugate gradient&quot;, “BFGS”, and &quot;L-BFGS&quot;算法，也需要改进提交的算法以实现参数的Regularization。</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
            <tag> classification </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 2</title>
      <link href="2017/12/20/markdown/Trending/MachineLearning/MachineLearning_2/"/>
      <url>2017/12/20/markdown/Trending/MachineLearning/MachineLearning_2/</url>
      
        <content type="html"><![CDATA[<p><a href="https://www.coursera.org/learn/machine-learning/home/week/2" target="_blank" rel="noopener">https://www.coursera.org/learn/machine-learning/home/week/2</a></p><h2 id="multivariate-linear-regression"><a class="markdownIt-Anchor" href="#multivariate-linear-regression"></a> Multivariate Linear Regression</h2><p>上周学的是单个参数的Linear Regression， 模型中只有一个变量x。Multivariate Linear Regression是，</p><span>$h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \cdots + \theta_n x_n$</span><!-- Has MathJax --><p>具体描述：</p><span>$\begin{align*}x_j^{(i)} &amp;= \text{value of feature } j \text{ in the }i^{th}\text{ training example} \newline x^{(i)}&amp; = \text{the input (features) of the }i^{th}\text{ training example} \newline m &amp;= \text{the number of training examples} \newline n &amp;= \text{the number of features} \end{align*}$</span><!-- Has MathJax --><p>适用Matrix表示就变成，</p><span>$\begin{align*}h_\theta(x) =\begin{bmatrix}\theta_0 \hspace{2em} \theta_1 \hspace{2em} ... \hspace{2em} \theta_n\end{bmatrix}\begin{bmatrix}x_0 \newline x_1 \newline \vdots \newline x_n\end{bmatrix}= \theta^T x\end{align*}$</span><!-- Has MathJax --><p>其中，</p><span>$x_{0}^{(i)} =1 \text{ for } (i\in { 1,\dots, m } )$</span><!-- Has MathJax --><h2 id="gradient-descent-for-multiple-variables"><a class="markdownIt-Anchor" href="#gradient-descent-for-multiple-variables"></a> Gradient Descent for Multiple Variables</h2><p>对于多参数的Linear Regression模型，求最优参数的算法相应就叫做，Gradient Descent for Multiple Variables。 运用前面的知识，其表示就写为：</p><span>$\begin{align*}&amp; \text{repeat until convergence:} \; \lbrace \newline \; &amp; \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \; &amp; \text{for j := 0...n}\newline \rbrace\end{align*}$</span><!-- Has MathJax --><h3 id="gradient-descent-in-practice-i-feature-scaling"><a class="markdownIt-Anchor" href="#gradient-descent-in-practice-i-feature-scaling"></a> Gradient Descent in Practice I - Feature Scaling</h3><p>一种方式是把模型中的训练数据的范围调整成放大或者缩小为同样大小。这样算法的复杂度好控制。<br>−1 ≤ x(i) ≤ 1<br>or<br>−0.5 ≤ x(i) ≤ 0.5</p><p>方法叫做feature scaling 或者 mean normalization：</p><span>$x_i := \dfrac{x_i - \mu_i}{s_i}$</span><!-- Has MathJax --><span>$\&mu;_i$</span><!-- Has MathJax --> is the average of all the values for feature (i) and \s_i is the range of values (max - min), or s_i is the standard deviation.<h3 id="gradient-descent-in-practice-ii-learning-rate"><a class="markdownIt-Anchor" href="#gradient-descent-in-practice-ii-learning-rate"></a> Gradient Descent in Practice II - Learning Rate</h3><p>如何选择最合适的learning rate参数？</p><p>Debugging gradient descent： 跑若干遍，如果J(θ)反而变大，那么说明这个参数太大了（步子太大，miss了最优点）</p><p>Automatic convergence test：实际情况下，选定两次运行J(θ)比较结果，如果差距小于比如0.001，则说明是在收敛.但是具体情况具体分析。</p><p>α is sufficiently small, then J(θ) will decrease on every iteration.</p><h2 id="features-and-polynomial-regression"><a class="markdownIt-Anchor" href="#features-and-polynomial-regression"></a> Features and Polynomial Regression</h2><p>Feature在ML里面指的是我们拿到的数据参数。比如房子的宽度算是一个feature。<br>有时候，为了fitin 模型，我们可以把参数二合一，比如不用长宽，而是用相乘得到的面积作为一个模型的feature。</p><p>Polynomial Regression<br>这是非liner regression。例如：</p><ul><li>quadratic function</li></ul><span>$h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1^2$</span><!-- Has MathJax --><ul><li>cubic function</li></ul><span>$h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1^2 + \theta_3 x_1^3$</span><!-- Has MathJax --><ul><li>square root function</li></ul><span>$h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 \sqrt{x_1}$</span><!-- Has MathJax --><p>需要建立的概念是以上每种模型的大致图形应当符合我们收集的数据。有算法帮助我们选择模型。</p><p>使用这些模型的时候feature scaling非常重要，因为平方或者多次方运算后的结果会很大或者很小。</p><h1 id="normal-equation"><a class="markdownIt-Anchor" href="#normal-equation"></a> Normal Equation</h1><p>非常重要的一种直接公式求<span>$\theta$</span><!-- Has MathJax -->的算法。适用于多参数的liner regression模型。</p><p>跟Gradient Descent相比各有利弊。<br>优点： 不用选择learning rate（靠经验和debug），不用多次interate求最优，直接算出来。<br>缺点： 计算复杂度随feature n的值飙升，<span>$n^3$</span><!-- Has MathJax --> 而Gradient Descent的计算复杂度是 <span>$kn^2$</span><!-- Has MathJax -->。<br>一般来说feature个数大于10，000时候需要考虑用Gradient Descent算法求<span>$\theta$</span><!-- Has MathJax -->。</p><h2 id="normal-equation的隐藏bug"><a class="markdownIt-Anchor" href="#normal-equation的隐藏bug"></a> Normal Equation的隐藏bug</h2><p>有时候有些数据用这个inv算inverse的时候算法会报错，避免这个问题，在octave中使用 ‘pinv’ 而不是’inv.’<br>1） 如果feature有重复（比如一个feature的平方尺一个feature是平方米）。<br>2） 如果m ≤ n，也就是说模型尺寸大于训练数据集。</p><h1 id="octave-tutorial"><a class="markdownIt-Anchor" href="#octave-tutorial"></a> Octave Tutorial</h1><ul><li>Octave定义一个function可以返回多个值 （定义返回值是matrix）</li><li>Octave可以实现通用的function，比如在视频中的例子是简单实现了liner regression的cost function。 这样我们给x和y的maxtrix就可以得到cost function的值。</li></ul><p>** Matrix运算符里面.* 代表对于每一行跟后面的乘</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>JDK or JRE?</title>
      <link href="2017/11/18/markdown/Java/JDKJRE/"/>
      <url>2017/11/18/markdown/Java/JDKJRE/</url>
      
        <content type="html"><![CDATA[<p><a href="https://serverfault.com/questions/372997/why-jdk-is-installed-with-web-application-servers" target="_blank" rel="noopener">https://serverfault.com/questions/372997/why-jdk-is-installed-with-web-application-servers</a></p><h1 id="whats-the-differnce-between-using-jdk-and-jre"><a class="markdownIt-Anchor" href="#whats-the-differnce-between-using-jdk-and-jre"></a> What’s the differnce between using JDK and JRE?</h1><p>The requirement of JDK or JRE is dependent on the particular application server itself. (e.g JBOSS, tomcat, glassfish, etc), and its strategies for compiling to bytecode, and how it decides on its dependencies at start-up.</p><p>In a strict sense if your java application just executes Java byte code in the form of classes, then you should be able to get away with just a JRE. However whether this is true or not depends on the Java App server strategy to either check for an installed JDK defensively at start-up, or just throw an exception at some point when compilation is requested.</p><p>Some application servers use the javac to compile jsp to class files and hence are dependent on having a system JDK installed, this can be contrasted with say tomcat, which bundles its own compiler for jsps, hence can run under the JRE.</p><p>The java keystore is a feature of the Java SE, and both openJDK and Hotspot reference a file<br>JAVA_HOME/lib/security/java.security<br>to select their defaults.</p><p>Unless you have changed $JAVA_HOME/lib/security/java.security, the default keystore.type=jks file implementation looks for $HOME/.keystore hence its up to you to over ride the location, and both the 1.5 and 1.6 version of the sunJDK use that format and default location.</p><p>so basically changing $JAVA_HOME wont effect the location of the keystore</p><p>(unless you have actually over ridden the keystore location into the $JAVA_HOME folder…)</p><p>but it might matter if you are using some non-default provider, or have set some non-default options in java.security.</p>]]></content>
      
      
      
        <tags>
            
            <tag> JDK </tag>
            
            <tag> security </tag>
            
            <tag> basic </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Flink</title>
      <link href="2017/11/11/markdown/Trending/BigData/Flink/"/>
      <url>2017/11/11/markdown/Trending/BigData/Flink/</url>
      
        <content type="html"><![CDATA[<h1 id="setup-the-ide"><a class="markdownIt-Anchor" href="#setup-the-ide"></a> setup the IDE</h1><p><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/java_api_quickstart.html" target="_blank" rel="noopener">https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/java_api_quickstart.html</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> mvn archetype:generate                               \</span></span><br><span class="line">      -DarchetypeGroupId=org.apache.flink              \</span><br><span class="line">      -DarchetypeArtifactId=flink-quickstart-java      \</span><br><span class="line">      -DarchetypeVersion=1.3.2</span><br></pre></td></tr></table></figure><p>Providing the value of</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">Define value for property &apos;groupId&apos;: learn.rachel.flink</span><br><span class="line">Define value for property &apos;artifactId&apos;: flinksamples</span><br><span class="line">Define value for property &apos;version&apos; 1.0-SNAPSHOT: : 0.1</span><br><span class="line">Define value for property &apos;package&apos; learn.rachel.flink: :</span><br><span class="line">Confirm properties configuration:</span><br><span class="line">groupId: learn.rachel.flink</span><br></pre></td></tr></table></figure><p>This command should run under the folder where the source code parent folder not exist.<br>For example, run under D:\workspaces<br>Then, after the cmd run , the project root will be D:\workspaces\flinksamples</p>]]></content>
      
      
      
        <tags>
            
            <tag> bigdata </tag>
            
            <tag> data streaming </tag>
            
            <tag> flink </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Machine Learning - Week 1</title>
      <link href="2017/09/24/markdown/Trending/MachineLearning/MachineLearning_1/"/>
      <url>2017/09/24/markdown/Trending/MachineLearning/MachineLearning_1/</url>
      
        <content type="html"><![CDATA[<p><a href="https://www.coursera.org/learn/c/lecture/RKFpn/welcome" target="_blank" rel="noopener">https://www.coursera.org/learn/c/lecture/RKFpn/welcome</a></p><h1 id="overview"><a class="markdownIt-Anchor" href="#overview"></a> Overview</h1><h2 id="example-of-machine-learning"><a class="markdownIt-Anchor" href="#example-of-machine-learning"></a> Example of machine Learning</h2><ul><li>Database mining</li><li>Application can’t program by Hand</li><li>Self-customizing programs</li><li>Understanding Human Learning (Brain, Real AI)</li></ul><h2 id="what-is-machine-learning"><a class="markdownIt-Anchor" href="#what-is-machine-learning"></a> What is Machine Learning</h2><p>Two definitions of Machine Learning are offered.</p><ul><li>Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”</li></ul><p>Example: playing checkers.</p><p>E = the experience of playing many games of checkers</p><p>T = the task of playing checkers.</p><p>P = the probability that the program will win the next game.</p><ul><li>In general, any machine learning problem can be assigned to one of two broad classifications:</li></ul><p>Supervised learning and Unsupervised learning.</p><ul><li>others include reinforcement learning and recommender systems</li></ul><h2 id="supervised-learning"><a class="markdownIt-Anchor" href="#supervised-learning"></a> Supervised Learning</h2><p><strong>告诉机器，这是数据，这是结果，你学习一下，以后再有数据请告诉我结果。</strong><br>分为两种，</p><ul><li>regression： 这是数据， 它和结果的关系我猜测是线性的。<br>比如， 根据预定的参数模型预测房价</li><li>classification：这是数据， 它和结果的关系是非线性的（离散的），请帮我预测界限。<br>比如， 根据预定的参数模型，预测肿瘤是恶性还是良性（离散结果）</li></ul><h2 id="unsupervised-learning"><a class="markdownIt-Anchor" href="#unsupervised-learning"></a> Unsupervised Learning</h2><p><strong>告诉机器，这是数据，你学习一下，找出他们之间有什么关系，什么分类。</strong></p><p>分为两种，</p><ul><li>Clustering： 数据分组。 例如基因分类</li><li>Non-clustering： “Cocktail Party Algorithm”，用来分析音源不同的声音数据并把数据给分开。</li></ul><h2 id="model-representation"><a class="markdownIt-Anchor" href="#model-representation"></a> Model Representation</h2><p>h 代表hypothesis<br>h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.</p><p>模型举例：<br>h(x)=θ_0+θ_1x<br><strong>Liner Regression with one variable = Univariate Liner Regression</strong></p><h2 id="cost-function"><a class="markdownIt-Anchor" href="#cost-function"></a> Cost Function</h2><p>Cost function is to measure the accuracy of our hypothesis function.</p><p>cost function for liner regression model<br>=Squared error function<br>=Mean squared error</p><span>$J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}_{i}- y_{i} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2$</span><!-- Has MathJax --><p>对于Liner Regression来说（h(x)=θ_0+θ_1x），它的模型图形总是一个bowl shaped，又叫convex function.</p><h3 id="cost-function-intuition-i"><a class="markdownIt-Anchor" href="#cost-function-intuition-i"></a> Cost Function - Intuition I</h3><p>假设我们的h模型是h(x)=θ_1x，<br>假设模型看起来是对的，但是我们还不知道正确的θ值，那么尝试θ值的过程中，每次尝试我们都可以记录cost function 的值的变化，假设我们模型（h(x)=θ_1x）是正确的情况下，记录θ_1值的变化和cost function的关系，就是一个倒抛物线，最低点就是我们要找的θ值。</p><p>所以找θ_1就变成，我们假定我们的模型是正确的，那么我们的数据点应该分布在模型上，给定一个猜测的θ_1，我们使用cost function来衡量误差（坡度），然后根据Gradient Descent算法向着正确的最优点移动（learning rate + slope)。</p><h3 id="cost-function-intuition-ii"><a class="markdownIt-Anchor" href="#cost-function-intuition-ii"></a> Cost Function - Intuition II</h3><p>假设我们的h模型是h(x)=θ_0+θ_1x<br>假设模型看起来是对的，但是我们还不知道正确的θ值，那么尝试θ0和θ1值的过程中，每次尝试我们都可以记录cost function 的值的变化，它是一个3D的倒抛物线网，体现了最低点就是我们要找的θ0和θ1值。</p><p>从另一个角度看，如果横轴是θ0， 纵轴是θ1，然后用线表示θ0和θ1值组合导致一样结果的cost function J(θ)的话，这个图会很像星系旋转的图。而星系旋转图的中心点，就是我们追寻的正确的θ0和θ1值。因为在那里cost function的值最小。</p><h2 id="gradient-descent"><a class="markdownIt-Anchor" href="#gradient-descent"></a> Gradient Descent</h2><p>Cost function是用来衡量我们的模型和参数的效果。Gradient Descent是一种算法帮我们找到最好的参数。</p><p>在这个课程中，<br>:= assignment<br>= truth assertion</p><p>gradient descent算法的表达：</p><p>repeat until convergence:</p><span>$\theta_1:=\theta_1-\alpha \frac{d}{d\theta_1} J(\theta_1)$</span><!-- Has MathJax --><p>Gradient Descent For Linear Regression：</p><span>$\begin{align*} \text{repeat until convergence: } \lbrace &amp; \newline \theta_0 := &amp; \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \newline \theta_1 := &amp; \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \newline \rbrace&amp; \end{align*}$</span><!-- Has MathJax --><p>上面的公式中，\alpha 表示的是learning rate ， 太大的话，可能永远找到最佳点，如果太小，可能花很长时间。</p><p>以下部分表示的是曲线的坡度（slope）。</p><span>$\frac{\partial}{\partial \theta_j}$</span><!-- Has MathJax --><p>坡度越小，越接近local minimum（最佳点），Gradient Descent会自动降低step大小，因为上面公式算出来的值越小。另外，该值还体现了坡度的正负，以修正正确的移动方向。<br>参见以下网址复习这种算法的原理：<br><a href="https://www.coursera.org/learn/machine-learning/supplement/QKEdR/gradient-descent-intuition" target="_blank" rel="noopener">https://www.coursera.org/learn/machine-learning/supplement/QKEdR/gradient-descent-intuition</a></p><p>当前介绍的gradient descent算法属于Batch gradient descent。它是gradient descent的细分算法，它算得时候采用完整已知数据集来计算。有些其它gradient descent算的时候是采用部分数据集的。</p><p>Normalised feature<br>f_norm = (f - f_mean) / (f_max - f_min)</p><h2 id="概念小结"><a class="markdownIt-Anchor" href="#概念小结"></a> 概念小结</h2><ul><li>Linier Regression是一个模型，是我们对数据规律的一种猜测。它属于适用于Supervised Learning中的Regression中的一种模型。</li><li>Cost Function是一个衡量模型中的参数合理性的标准。本节中学习的Squared error function是针对liner regression model的cost function。</li><li>Gradient Descent是一种求最佳参数的算法。它特别针对的也是Squared error function做为衡量标准时候帮助求最优参数的算法。</li></ul><h2 id="matrice-and-vectors"><a class="markdownIt-Anchor" href="#matrice-and-vectors"></a> Matrice and Vectors</h2><h3 id="基本概念"><a class="markdownIt-Anchor" href="#基本概念"></a> 基本概念</h3><p>表达Matrix大小是row×colume， 比如3*2的matrix就是3行2列。数学表示是R加上row×colume的右上标<br>表达Matrix中的数据，用下角标row,colume</p><p>Vector是只有1列的matrix (a n×1 matrix)， 数学表达是R加上Row的右上标。</p><p>Scalar： row和column都等于1</p><span>$\mathbb{R}$</span><!-- Has MathJax --> : refers to the set of scalar real numbers.<span>$\mathbb{R^n}$</span><!-- Has MathJax -->: refers to the set of n-dimensional vectors of real numbers.<h3 id="matrix运算"><a class="markdownIt-Anchor" href="#matrix运算"></a> Matrix运算</h3><p>加减法： 同位置的加减；参与运算的Matrix必须一样大小<br>乘除scalar：每个元素均和scalar相乘除<br>Matrix-Vector乘法：An m x n matrix multiplied by an n x 1 vector results in an m x 1 vector.<br>Matrix-Matrix乘法：An m x n matrix multiplied by an n x o matrix results in an m x o matrix.<br>第一个matix的行乘以第二个matrix的列</p><p>Matrix相乘的概念非常重要：</p><span>$$\begin{bmatrix} a &amp; b \newline c &amp; d \newline e &amp; f \end{bmatrix} *\begin{bmatrix} w &amp; x \newline y &amp; z \newline \end{bmatrix} =\begin{bmatrix} a*w + b*y &amp; a*x + b*z \newline c*w + d*y &amp; c*x + d*z \newline e*w + f*y &amp; e*x + f*z\end{bmatrix}$$</span><!-- Has MathJax --><h3 id="matrix运算的特点"><a class="markdownIt-Anchor" href="#matrix运算的特点"></a> Matrix运算的特点</h3><p>A×B 不等于B×A<br>A×B×C 等于A×(B×C)<br>identical matrix是一个左上到右下对角线是1，剩余位置是0的特殊Matix。因为Matrix和identical matrix相乘后结果不变。</p><h3 id="特殊的matrix"><a class="markdownIt-Anchor" href="#特殊的matrix"></a> 特殊的Matrix</h3><p>Inverse Matrix：一定是一个m×m的正方形的Matrix<br>Matrix×(InverseMatrix)=identicalMatrix<br>如果Matrix没有inverse，比如全零，则叫做singular或者degenerateMatrix</p><p>Transpose Matrix： 行变列</p>]]></content>
      
      
      
        <tags>
            
            <tag> Machine Learning </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>keytool</title>
      <link href="2017/08/09/markdown/BackToBasic/Linux/Keytool/"/>
      <url>2017/08/09/markdown/BackToBasic/Linux/Keytool/</url>
      
        <content type="html"><![CDATA[<h2 id="p12"><a class="markdownIt-Anchor" href="#p12"></a> p12</h2><ul><li>list p12 contents then extract certificate from p12</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">keytool -v -list -storetype pkcs12 -keystore KEYSTORE_ABSOLUTE_PATH.p12</span><br><span class="line">keytool -exportcert -keystore KEYSTORE_ABSOLUTE_PATH.p12 -storetype PKCS12 -storepass KEYSTORE_PASSWORD -alias ALIAS -file EXPORTED_CERT_NAME.crt</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> linux </tag>
            
            <tag> keytool </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Certificates Related</title>
      <link href="2017/08/05/markdown/BackToBasic/Security/CertificatesRelated/"/>
      <url>2017/08/05/markdown/BackToBasic/Security/CertificatesRelated/</url>
      
        <content type="html"><![CDATA[<h1 id="certificates"><a class="markdownIt-Anchor" href="#certificates"></a> Certificates</h1><p>What is SSL certificate Chain</p><p><a href="https://support.dnsimple.com/articles/what-is-ssl-certificate-chain/" target="_blank" rel="noopener">https://support.dnsimple.com/articles/what-is-ssl-certificate-chain/</a></p><p>intermediate CA</p><h1 id="storage-of-keys-and-certificates"><a class="markdownIt-Anchor" href="#storage-of-keys-and-certificates"></a> Storage of Keys and Certificates</h1><p><a href="https://en.wikipedia.org/wiki/PKCS_12" target="_blank" rel="noopener">https://en.wikipedia.org/wiki/PKCS_12</a></p><ul><li><p><strong>.p12</strong></p><p>corresponding tool is openssl</p></li><li><p><strong>.pfx</strong></p><p>microsoft version</p></li><li><p><strong>.pem</strong></p><p>lists the certificates and possibly private keys as Base 64 strings in a text file</p></li></ul><h2 id="list-of-all-kinds-of-files-and-contents"><a class="markdownIt-Anchor" href="#list-of-all-kinds-of-files-and-contents"></a> list of all kinds of files and contents</h2><p><a href="https://blogs.msdn.microsoft.com/kaushal/2010/11/04/various-ssltls-certificate-file-typesextensions/" target="_blank" rel="noopener">https://blogs.msdn.microsoft.com/kaushal/2010/11/04/various-ssltls-certificate-file-typesextensions/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> basic </tag>
            
            <tag> SSL </tag>
            
            <tag> TLS </tag>
            
            <tag> CA </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Java JCE</title>
      <link href="2017/08/03/markdown/BackToBasic/Security/CheckJavaJCE/"/>
      <url>2017/08/03/markdown/BackToBasic/Security/CheckJavaJCE/</url>
      
        <content type="html"><![CDATA[<h1 id="background"><a class="markdownIt-Anchor" href="#background"></a> Background</h1><p><a href="http://docs.oracle.com/javase/7/docs/technotes/guides/security/SunProviders.html" target="_blank" rel="noopener">http://docs.oracle.com/javase/7/docs/technotes/guides/security/SunProviders.html</a></p><p><a href="http://dino.ciuffetti.info/2016/04/how-to-check-if-jce-unlimited-strength-policy-is-installed/" target="_blank" rel="noopener">http://dino.ciuffetti.info/2016/04/how-to-check-if-jce-unlimited-strength-policy-is-installed/</a></p><h2 id="checking-jce-using-java-command-line"><a class="markdownIt-Anchor" href="#checking-jce-using-java-command-line"></a> Checking JCE using Java command line</h2><p>Here is the command used to check JCE strength on windows. The output should be &gt;=256</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('AES'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('DES'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('DESede'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC2'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC4'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RSA'));"</span><br><span class="line">jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC5'));"</span><br></pre></td></tr></table></figure><p>Example of result on windows where JCE is configured</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('AES'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('DES'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('DESede'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC2'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC4'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RSA'));"</span><br><span class="line">2147483647</span><br><span class="line">C:\java\jdk1.8.0_91\bin&gt;jrunscript -e "print (javax.crypto.Cipher.getMaxAllowedKeyLength('RC5'));"</span><br><span class="line">2147483647</span><br></pre></td></tr></table></figure><p>Same command can be used on linux to check. Just make sure the jrunscript is triggerred using the one under the &lt;JRE/JDK_home&gt;bin you want to check.</p>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> basic </tag>
            
            <tag> java </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ansible</title>
      <link href="2017/07/20/markdown/BackToBasic/Linux/Ansible/Ansible/"/>
      <url>2017/07/20/markdown/BackToBasic/Linux/Ansible/Ansible/</url>
      
        <content type="html"><![CDATA[<h1 id="setup-ansible"><a class="markdownIt-Anchor" href="#setup-ansible"></a> setup ansible</h1><p><a href="http://docs.ansible.com/ansible/latest/intro_getting_started.html" target="_blank" rel="noopener">http://docs.ansible.com/ansible/latest/intro_getting_started.html</a></p><h2 id="setup-and-ping-all-the-hosts"><a class="markdownIt-Anchor" href="#setup-and-ping-all-the-hosts"></a> setup and ping all the hosts</h2><p>To set up the ssh</p><ol><li>generate key pair</li></ol> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen -t rsa -f /&lt;path-to-key&gt;/authorized_keys.myuserid</span><br></pre></td></tr></table></figure><ul><li>After this command, we should get one pair of keys</li><li>myuserid should be the user exist on all the target servers and have access</li></ul><ol start="2"><li>add private key to ssh-client on control box</li></ol> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ssh-agent bash</span><br><span class="line">ssh-add /&lt;path-to-key&gt;/authorized_keys.myuserid</span><br></pre></td></tr></table></figure><ol start="3"><li><p>list all hosts into /etc/ansible/hosts file</p></li><li><p>using myuserid to log in control machine, and start triggerring command</p></li></ol> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">ansible all -m ping -u liurx5 -i ./DEV.hosts --sudo --ask-become-pass</span><br><span class="line">ansible all -a "/bin/echo hello" --sudo --ask-become-pass</span><br></pre></td></tr></table></figure><h2 id="run-playbook-against-target-hosts-as-root"><a class="markdownIt-Anchor" href="#run-playbook-against-target-hosts-as-root"></a> run playbook against target hosts as root</h2> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ansible-playbook checkRequiredApps.yml -i ./hosts --ask-become-pass</span><br></pre></td></tr></table></figure><h2 id="task-samples"><a class="markdownIt-Anchor" href="#task-samples"></a> Task Samples</h2>]]></content>
      
      
      
        <tags>
            
            <tag> linux </tag>
            
            <tag> shell </tag>
            
            <tag> ansible </tag>
            
            <tag> automation </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Java Code Style</title>
      <link href="2017/07/12/markdown/BackToBasic/Security/JavaCodeStyle/"/>
      <url>2017/07/12/markdown/BackToBasic/Security/JavaCodeStyle/</url>
      
        <content type="html"><![CDATA[<h1 id="reference-standard"><a class="markdownIt-Anchor" href="#reference-standard"></a> Reference Standard</h1><h2 id="google-java-style"><a class="markdownIt-Anchor" href="#google-java-style"></a> Google Java style</h2><p><a href="https://google.github.io/styleguide/javaguide.html" target="_blank" rel="noopener">https://google.github.io/styleguide/javaguide.html</a></p><ul><li><p>Source files are encoded in UTF-8.</p></li><li><p>Tab characters are not used for indentation; use two spaces</p></li><li><p>A source file consists of, in order:</p></li><li><p>License or copyright information, if present</p></li><li><p>Package statement</p></li><li><p>Import statements</p></li></ul><blockquote><p>no wildcard imports</p></blockquote><ul><li>Exactly one top-level class</li></ul><blockquote><p>never split overloads classes<br>always use braces for if,else,for,do and while<br>style of braces</p></blockquote>  <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> () -&gt; &#123;</span><br><span class="line">  <span class="keyword">while</span> (condition()) &#123;</span><br><span class="line">    method();</span><br><span class="line">  &#125;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="keyword">new</span> MyClass() &#123;</span><br><span class="line">  <span class="meta">@Override</span> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">method</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="keyword">if</span> (condition()) &#123;</span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        something();</span><br><span class="line">      &#125; <span class="keyword">catch</span> (ProblemException e) &#123;</span><br><span class="line">        recover();</span><br><span class="line">      &#125;</span><br><span class="line">    &#125; <span class="keyword">else</span> <span class="keyword">if</span> (otherCondition()) &#123;</span><br><span class="line">      somethingElse();</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">      lastThing();</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><blockquote><p>line-wrapping normally at 100, but there are exceptions; wrapped lines indent 4+ spaces</p></blockquote><ul><li><strong>Exactly one blank line</strong> separates each section that is present</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> java </tag>
            
            <tag> code style </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Postgres database common</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/psql/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/psql/</url>
      
        <content type="html"><![CDATA[<h1 id="login-database"><a class="markdownIt-Anchor" href="#login-database"></a> login database</h1><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">[root@postgres-server01 keytabs]# psql ambari ambari</span><br><span class="line">Password for user ambari:</span><br><span class="line">psql (9.2.18)</span><br><span class="line">Type "help" for help.</span><br><span class="line"></span><br><span class="line">ambari=#</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> postgres </tag>
            
            <tag> database </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari UI integrate with AD</title>
      <link href="2017/07/10/markdown/Trending/BigData/Ambari_UI_AD/"/>
      <url>2017/07/10/markdown/Trending/BigData/Ambari_UI_AD/</url>
      
        <content type="html"><![CDATA[<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">[root@hdf-server01 ambari-server]# ambari-server setup-ldap</span><br><span class="line">Using python  /usr/bin/python</span><br><span class="line">Setting up LDAP properties...</span><br><span class="line">Primary URL* &#123;host:port&#125;: ldapServer:389</span><br><span class="line">Secondary URL &#123;host:port&#125; :</span><br><span class="line">Use SSL* [true/false] (false):</span><br><span class="line">User object class* (person):</span><br><span class="line">User name attribute* (sAMAccountName):</span><br><span class="line">Group object class* (group):</span><br><span class="line">Group name attribute* (cn):</span><br><span class="line">Group member attribute* (member):</span><br><span class="line">Distinguished name attribute* ():CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Referral method [follow/ignore] (ignore):</span><br><span class="line">Bind anonymously* [true/false] (false):</span><br><span class="line">Handling behavior for username collisions [convert/skip] for LDAP sync* (skip):</span><br><span class="line">Manager DN* :CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Enter Manager Password* :</span><br><span class="line">Re-enter password:</span><br><span class="line">====================</span><br><span class="line">Review Settings</span><br><span class="line">====================</span><br><span class="line">authentication.ldap.managerDn: CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">authentication.ldap.managerPassword: *****</span><br><span class="line">Save settings [y/n] (y)?</span><br><span class="line">Saving...done</span><br><span class="line">Ambari Server 'setup-ldap' completed successfully.</span><br></pre></td></tr></table></figure><p>After setup , run below command to trigger the sync.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ambari restart</span><br><span class="line">ambari-server sync-ldap --all</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
            <tag> security integration </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari UI integrate with AD</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Ambari/Ambari_UI_AD/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Ambari/Ambari_UI_AD/</url>
      
        <content type="html"><![CDATA[<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">[root@hdf-server01 ambari-server]# ambari-server setup-ldap</span><br><span class="line">Using python  /usr/bin/python</span><br><span class="line">Setting up LDAP properties...</span><br><span class="line">Primary URL* &#123;host:port&#125;: ldapServer:389</span><br><span class="line">Secondary URL &#123;host:port&#125; :</span><br><span class="line">Use SSL* [true/false] (false):</span><br><span class="line">User object class* (person):</span><br><span class="line">User name attribute* (sAMAccountName):</span><br><span class="line">Group object class* (group):</span><br><span class="line">Group name attribute* (cn):</span><br><span class="line">Group member attribute* (member):</span><br><span class="line">Distinguished name attribute* ():CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Referral method [follow/ignore] (ignore):</span><br><span class="line">Bind anonymously* [true/false] (false):</span><br><span class="line">Handling behavior for username collisions [convert/skip] for LDAP sync* (skip):</span><br><span class="line">Manager DN* :CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Enter Manager Password* :</span><br><span class="line">Re-enter password:</span><br><span class="line">====================</span><br><span class="line">Review Settings</span><br><span class="line">====================</span><br><span class="line">authentication.ldap.managerDn: CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">authentication.ldap.managerPassword: *****</span><br><span class="line">Save settings [y/n] (y)?</span><br><span class="line">Saving...done</span><br><span class="line">Ambari Server 'setup-ldap' completed successfully.</span><br></pre></td></tr></table></figure><p>After setup , run below command to trigger the sync.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ambari restart</span><br><span class="line">ambari-server sync-ldap --all</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
            <tag> security integration </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Reset Ranger Password</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Ambari/ResetPassword/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Ambari/ResetPassword/</url>
      
        <content type="html"><![CDATA[<h1 id="forgot-the-username-and-password"><a class="markdownIt-Anchor" href="#forgot-the-username-and-password"></a> forgot the username and password</h1><p><a href="https://community.hortonworks.com/content/supportkb/49508/how-to-change-grafana-admin-password-when-the-pass.html" target="_blank" rel="noopener">https://community.hortonworks.com/content/supportkb/49508/how-to-change-grafana-admin-password-when-the-pass.html</a></p><h1 id="ranger-ui-password-missing-after-switching-the-ui-authentication"><a class="markdownIt-Anchor" href="#ranger-ui-password-missing-after-switching-the-ui-authentication"></a> Ranger UI password missing after switching the UI authentication</h1><p><a href="https://community.hortonworks.com/questions/4408/is-there-any-way-to-reset-ranger-admin-ui-password.html" target="_blank" rel="noopener">https://community.hortonworks.com/questions/4408/is-there-any-way-to-reset-ranger-admin-ui-password.html</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /var/lib/pgsql/data/pg_hba.conf</span><br></pre></td></tr></table></figure><p>add below line to give access,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">local all angerdba trust</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">psql rangerdb -U rangerdba</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">update x_portal_user set password = 'ceb4f32325eda6142bd65215f4c0f371' where login_id = 'admin';</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> ranger </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>kafka message transaction</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaMessagingTransaction/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaMessagingTransaction/</url>
      
        <content type="html"><![CDATA[<p><a href="https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka" target="_blank" rel="noopener">https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka</a></p><p><a href="https://medium.com/@andrew_schofield/does-apache-kafka-do-acid-transactions-647b207f3d0e" target="_blank" rel="noopener">https://medium.com/@andrew_schofield/does-apache-kafka-do-acid-transactions-647b207f3d0e</a></p><p>So, does Apache Kafka do ACID transactions? Absolutely not. No way. Can you get a similar effect? If you design your applications in the right way, yes. Does it matter? In many cases, not really, but when it does, you absolutely don’t want to get it wrong. Just take the time to understand the guarantees that you need to make your system reliable and choose accordingly.</p><p>DB-&gt;Topic xa transaction, can do</p><p>Topic-&gt;DB xa transaction, hard to implmement</p>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
            <tag> transaction </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka Monitoring</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaMonitoring/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaMonitoring/</url>
      
        <content type="html"><![CDATA[<h1 id="kafka-monitoring"><a class="markdownIt-Anchor" href="#kafka-monitoring"></a> Kafka monitoring</h1><p>The JMX for kafka is by default turned on by Ambari</p><ul><li>default installation will set JMX port at 16667</li><li>default installation with no security turned on</li></ul><h2 id="check-the-default-jmx-monitoring-settings"><a class="markdownIt-Anchor" href="#check-the-default-jmx-monitoring-settings"></a> Check the default JMX monitoring settings</h2><p>From local machine, switch to the JDK folder and run jconsole</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&lt;JDK_Home&gt;/bin/jconsole</span><br></pre></td></tr></table></figure><p>With the UI, use kafkaBrokerhostName:16667 as the connection string to connect to Kafka.<br><img src="images/KafkaMonitoring/jConsole_Kafka.PNG" alt="jConsole_Kafka"></p>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
            <tag> monitoring </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka Env Debugging Tools</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaTools/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KafkaTools/</url>
      
        <content type="html"><![CDATA[<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">mkdir -p /opt/kafka-tools</span><br><span class="line">cp -R /path/kafka-manager-1.3.3.7 /opt/kafka-tools</span><br><span class="line">nohup /opt/kafka-tools/kafka-manager-1.3.3.7/bin/kafka-manager -Dconfig.file=/opt/kafka-tools/kafka-manager-1.3.3.7/conf/hdf-server.conf -Dhttp.port=8888 &gt;/dev/null 2&gt;&amp;1 &amp;</span><br><span class="line"></span><br><span class="line">nohup java -jar /opt/kafka-tools/kafkadrop/kafdrop-2.0.0.jar --zookeeper.connect=hdf-server03:2181,hdf-server04:2181,hdf-server05:2181 --server.port=8889 &gt;/dev/null 2&gt;&amp;1 &amp;</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
            <tag> tools </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka Env Debugging Tools</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KakfaDailyMaintain/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/KakfaDailyMaintain/</url>
      
        <content type="html"><![CDATA[<h1 id="kafka-common-configuration"><a class="markdownIt-Anchor" href="#kafka-common-configuration"></a> Kafka Common configuration</h1><h2 id="auto-start"><a class="markdownIt-Anchor" href="#auto-start"></a> Auto start</h2><p>For Ambari managed Kafka, configure autostart from Console</p><p>http://[ambari-host]:8080/#/main/admin/serviceAutoStart</p><p>By default, the Kafka and zookeeper “autostart” function provided by Ambari is stopped. For any env other than <strong>DEV</strong> this is be modified to enabled.</p><h2 id="change-kafka-configuration"><a class="markdownIt-Anchor" href="#change-kafka-configuration"></a> Change Kafka configuration</h2><p>To change kakfa configuration,</p><ul><li>For Ambari managed Kafka Cluster,<ul><li>Modify from Ambari Console</li></ul></li><li>For manually configured Kafka Cluster</li><li>modify the kafka.properties file.</li></ul><h3 id="configurations"><a class="markdownIt-Anchor" href="#configurations"></a> Configurations</h3><ul><li>Allow Topic auto created</li></ul><p><strong>auto.create.topics.enable</strong></p><p>Default is true. Should disable in <strong>PROD</strong> environment</p><ul><li>Allow Topic Deletion</li></ul><p><strong>delete.topic.enable</strong></p><p>Default is false. Should disable in <strong>DEV</strong> environment.</p><h2 id="useful-commands"><a class="markdownIt-Anchor" href="#useful-commands"></a> Useful Commands</h2><p>Assuming kafka is installed to default path by Hortonworks Ambari Pack.</p><ul><li>Create a topic</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-topics.sh --create \</span><br><span class="line">--zookeeper zookeeperserver1:2181,zookeeperserver2:2181,zookeeperserver3:2181 \</span><br><span class="line">--replication-factor 3 \</span><br><span class="line">--partitions 1 \</span><br><span class="line">--topic my-new-topic</span><br></pre></td></tr></table></figure><ul><li>List all existing Topics</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-topics.sh  --list \</span><br><span class="line">--zookeeper zookeeperserver1:2181,zookeeperserver2:2181,zookeeperserver3:2181</span><br></pre></td></tr></table></figure><ul><li>Delete a topic</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-topics.sh  --delete \</span><br><span class="line">--zookeeper zookeeperserver1:2181,zookeeperserver2:2181,zookeeperserver3:2181 \ --topic my-new-topic</span><br></pre></td></tr></table></figure><ul><li>Describe a topic</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-topics.sh \</span><br><span class="line">--zookeeper zookeeperserver1:2181,zookeeperserver2:2181,zookeeperserver3:2181 \</span><br><span class="line">--describe --topic my-new-topic</span><br></pre></td></tr></table></figure><ul><li>Publish message to a topic</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">kafka-console-producer.sh \</span><br><span class="line">--broker-list kafka-broker1:6667,kafka-broker2:6667,kafka-broker3:6667 \</span><br><span class="line">--sync --topic my-new-topic</span><br></pre></td></tr></table></figure><ul><li>Consume a topic</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-console-consumer.sh \</span><br><span class="line">--bootstrap-server kafka-broker1:6667,kafka-broker2:6667,kafka-broker3:6667 \</span><br><span class="line">--topic my-new-topic --from-beginning</span><br></pre></td></tr></table></figure><ul><li>Change topic message retention time</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-configs.sh \</span><br><span class="line">--zookeeper zookeeperserver1:2181,zookeeperserver2:2181,zookeeperserver3:2181 \</span><br><span class="line">--alter --entity-type topics \</span><br><span class="line">--entity-name my-new-topic \</span><br><span class="line">--add-config retention.ms=86400000</span><br></pre></td></tr></table></figure><h1 id="kafka-monitring-via-ambari"><a class="markdownIt-Anchor" href="#kafka-monitring-via-ambari"></a> Kafka Monitring Via Ambari</h1><p>Definition of metrics</p><p><a href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-user-guide/content/grafana_kfka_hosts.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-user-guide/content/grafana_kfka_hosts.html</a></p><p>Bytes In</p>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
            <tag> command cheetsheet </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari Server to Ambari Agent communication protected by SSL</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari-ServerAgent_SSL/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari-ServerAgent_SSL/</url>
      
        <content type="html"><![CDATA[<p><a href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-security/content/optional_set_up_two-way_ssl_between_ambari_server_and_ambari_agents.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-security/content/optional_set_up_two-way_ssl_between_ambari_server_and_ambari_agents.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> SSL </tag>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kerberize Ambari cluster</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/AmbariCluster_Kerberos/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/AmbariCluster_Kerberos/</url>
      
        <content type="html"><![CDATA[<h1 id="kerberos"><a class="markdownIt-Anchor" href="#kerberos"></a> Kerberos</h1><p>Kafka SASL relying on Kerberized cluster.</p><h2 id="configurations-for-enable-kerberos-via-ambari-wizard"><a class="markdownIt-Anchor" href="#configurations-for-enable-kerberos-via-ambari-wizard"></a> configurations for enable Kerberos via Ambari wizard</h2><table><thead><tr><th>Configuration Name</th><th>Value</th></tr></thead><tbody><tr><td>KDC Type</td><td>Existing Active Directory</td></tr><tr><td>KDC hosts</td><td>kdcserver1,kdcserver2</td></tr><tr><td>Realm name</td><td><a href="http://DOMAINNAME.CAPITAL.NET" target="_blank" rel="noopener">DOMAINNAME.CAPITAL.NET</a></td></tr><tr><td>LDAP url</td><td>ldaps://ldapserver1.domainname.capital.net:636</td></tr><tr><td>Container DN</td><td>OU=AmbariCluster, DC=net</td></tr><tr><td>Domains</td><td>DOMAINNAME</td></tr><tr><td>Kadmin host</td><td>kdcserver1</td></tr><tr><td>Admin principal</td><td>SUPERUSER</td></tr><tr><td>Admin password</td><td>password</td></tr></tbody></table><h2 id="mandatory-configuration-for-nifi-when-kerberos-is-enabled"><a class="markdownIt-Anchor" href="#mandatory-configuration-for-nifi-when-kerberos-is-enabled"></a> mandatory configuration for Nifi when Kerberos is enabled</h2><h3 id="specify-the-kerberos-provider"><a class="markdownIt-Anchor" href="#specify-the-kerberos-provider"></a> Specify the kerberos provider</h3><p>make sure the kerberos-provider details is defined at “Template for login-identity-providers.xml”.</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">provider</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">identifier</span>&gt;</span>kerberos-provider<span class="tag">&lt;/<span class="name">identifier</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">class</span>&gt;</span>org.apache.nifi.kerberos.KerberosProvider<span class="tag">&lt;/<span class="name">class</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Default Realm"</span>&gt;</span>DOMAINNAME.CAPITAL.NET<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Authentication Expiration"</span>&gt;</span>12 hours<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">provider</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="check-the-user-mapping"><a class="markdownIt-Anchor" href="#check-the-user-mapping"></a> Check the user mapping</h3><p>After kerbereros is enabled, the ldap user name logged in may contains domain like <a href="mailto:username@domain.com" target="_blank" rel="noopener">username@domain.com</a></p><p>The name might not match with autorization policy in Ranger.</p><p>To solve this, we should config the identity mapping for Nifi.</p><p><a href="https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_security/content/identity-mapping.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_security/content/identity-mapping.html</a></p><p>So that when user name and password is given from Nifi login form, it will be regrexed and submit partial part to Ranger to authorize.</p><p>For example,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?)$</span><br><span class="line">nifi.security.identity.mapping.value.dn=$1</span><br><span class="line">nifi.security.identity.mapping.pattern.kerb=^(.*?)@(.*?)$</span><br><span class="line">nifi.security.identity.mapping.value.kerb=$1</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> SSL </tag>
            
            <tag> hortonworks </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari UI integrate with AD</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari_UI_AD/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari_UI_AD/</url>
      
        <content type="html"><![CDATA[<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">[root@hdf-server01 ambari-server]# ambari-server setup-ldap</span><br><span class="line">Using python  /usr/bin/python</span><br><span class="line">Setting up LDAP properties...</span><br><span class="line">Primary URL* &#123;host:port&#125;: ldapServer:389</span><br><span class="line">Secondary URL &#123;host:port&#125; :</span><br><span class="line">Use SSL* [true/false] (false):</span><br><span class="line">User object class* (person):</span><br><span class="line">User name attribute* (sAMAccountName):</span><br><span class="line">Group object class* (group):</span><br><span class="line">Group name attribute* (cn):</span><br><span class="line">Group member attribute* (member):</span><br><span class="line">Distinguished name attribute* ():CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Referral method [follow/ignore] (ignore):</span><br><span class="line">Bind anonymously* [true/false] (false):</span><br><span class="line">Handling behavior for username collisions [convert/skip] for LDAP sync* (skip):</span><br><span class="line">Manager DN* :CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">Enter Manager Password* :</span><br><span class="line">Re-enter password:</span><br><span class="line">====================</span><br><span class="line">Review Settings</span><br><span class="line">====================</span><br><span class="line">authentication.ldap.managerDn: CN=UserToPullUserData,OU=IT Department,DC=hortonworks</span><br><span class="line">Base DN* :OU=IT Department,OU=IT hortonworks,DC=hortonworks</span><br><span class="line">authentication.ldap.managerPassword: *****</span><br><span class="line">Save settings [y/n] (y)?</span><br><span class="line">Saving...done</span><br><span class="line">Ambari Server 'setup-ldap' completed successfully.</span><br></pre></td></tr></table></figure><p>After setup , run below command to trigger the sync.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ambari restart</span><br><span class="line">ambari-server sync-ldap --all</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
            <tag> security integration </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari UI protect by SSL</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari_UI_HTTPS/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ambari_UI_HTTPS/</url>
      
        <content type="html"><![CDATA[<p><a href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-security/content/optional_set_up_ssl_for_ambari.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-security/content/optional_set_up_ssl_for_ambari.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> SSL </tag>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka Env Debugging Tools</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/EnableNifiSecurity/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/EnableNifiSecurity/</url>
      
        <content type="html"><![CDATA[<h1 id="nifi-security"><a class="markdownIt-Anchor" href="#nifi-security"></a> Nifi Security</h1><p><a href="https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_security/content/enabling-ssl-without-ca.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_security/content/enabling-ssl-without-ca.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> security integration </tag>
            
            <tag> nifi </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka and Kafka Client communication protected by SASL</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Kafka_SASL_PLAINTEXT/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Kafka_SASL_PLAINTEXT/</url>
      
        <content type="html"><![CDATA[<h1 id="kafka-sasl-configurations"><a class="markdownIt-Anchor" href="#kafka-sasl-configurations"></a> Kafka SASL configurations</h1><h2 id="pre-requirements"><a class="markdownIt-Anchor" href="#pre-requirements"></a> Pre-requirements</h2><p>Kafka SASL requires the Ambari Cluster to be Kerberized.</p><h2 id="enable-sasl_plaintext"><a class="markdownIt-Anchor" href="#enable-sasl_plaintext"></a> Enable SASL_PLAINTEXT</h2><ul><li><p>add below listeners to kafka listeners list,</p><p>SASL_PLAINTEXT://localhost:6669</p></li><li><p>security.inter.broker.protocol=SASL_PLAINTEXT</p><p>The default value is PLAINTEXTSASL (after kerberize wizard), should be changed to SASL_PLAINTEXT</p><p>???Should we change it to PLAINTEXT for performance?</p></li></ul><h3 id="test-the-sasl_plaintext"><a class="markdownIt-Anchor" href="#test-the-sasl_plaintext"></a> Test the SASL_PLAINTEXT</h3><h4 id="test-from-commandline"><a class="markdownIt-Anchor" href="#test-from-commandline"></a> Test from commandline</h4><ul><li>Turn on Ranger-Kafka Plugin</li><li>Check current user (make sure we are not in sudo command line)</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">klist</span><br><span class="line">Ticket cache: FILE:/tmp/krb5cc_239506898_CkQLn0</span><br><span class="line">Default principal: myusername@COMPANY.DOMAIN.COM</span><br><span class="line"></span><br><span class="line">Valid starting     Expires            Service principal</span><br><span class="line">17/08/17 09:31:23  17/08/17 19:31:23  krbtgt/COMPANY.DOMAIN.COM@COMPANY.DOMAIN.COM</span><br><span class="line">        renew until 24/08/17 09:31:23</span><br></pre></td></tr></table></figure><ul><li>in ranger, check the current user have access to the topic</li></ul><p>Permissions list:<br>Publish, Consume, COnfigure, Describe, Create, Delete, Kafka Admin</p><ul><li>Trigger command line</li></ul><blockquote><p>specify the protocol using<br>use full domain name while list the hosts</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-console-producer.sh --broker-list broker1.domain.net:6669,broker2.domain.net:6669 --topic topicname --security-protocol SASL_PLAINTEXT</span><br><span class="line">Msg1</span><br><span class="line">Msg2</span><br></pre></td></tr></table></figure><p>Correspondingly, consume from command line</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-console-consumer.sh \</span><br><span class="line"><span class="meta">&gt;</span><span class="bash"> --bootstrap-server broker1.domain.net:6669,broker2.domain.net:6669 \</span></span><br><span class="line"><span class="meta">&gt;</span><span class="bash"> --topic topicname --from-beginning --security-protocol SASL_PLAINTEXT</span></span><br><span class="line">Msg1</span><br><span class="line">Msg2</span><br></pre></td></tr></table></figure><h4 id="test-from-java-client"><a class="markdownIt-Anchor" href="#test-from-java-client"></a> Test from java client</h4><p>Using keytab is recommended for PROD env.<br>In test environment, we copy keytab from linux server to use at client side.</p><ul><li>Use ktutil to check the principal of the keytab<br><a href="https://docs.oracle.com/cd/E19683-01/806-4078/6jd6cjs1q/index.html" target="_blank" rel="noopener">https://docs.oracle.com/cd/E19683-01/806-4078/6jd6cjs1q/index.html</a></li></ul><p>add below configuration to client side jvm</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">-Djava.security.auth.login.config=/opt/certificates/kafka_SASL/kafka_client_jaas.conf</span><br></pre></td></tr></table></figure><p>The kafka_client_jaas.conf is like this,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">KafkaClient &#123;</span><br><span class="line">    com.sun.security.auth.module.Krb5LoginModule required</span><br><span class="line">    useKeyTab=true</span><br><span class="line">    storeKey=true</span><br><span class="line">    keyTab=&quot;/opt/certificates/kafka_SASL/kafka.service.keytab&quot;</span><br><span class="line">    principal=&quot;kafka/broker1.company.domain.com@COMPANY.DOMAIN.COM&quot;;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>The /opt/certificates/kafka_SASL/kafka.service.keytab is copied from server which have access to Kafka service. Or, we can generate for certain service account in AD and assign access to the account via Ranger.</p><p>And in Java Client code, add below properties. “kafka” is the service name defined in kafka_jaas.conf at /usr/hdf/current/kafka-broker/conf</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">props.put(<span class="string">"security.protocol"</span>, <span class="string">"SASL_PLAINTEXT"</span>);</span><br><span class="line">props.put(<span class="string">"sasl.kerberos.service.name"</span>, <span class="string">"kafka"</span>);</span><br></pre></td></tr></table></figure><h2 id="back-compatible"><a class="markdownIt-Anchor" href="#back-compatible"></a> Back compatible</h2><p>Once Ranger-kafka plugin is turned on, the PLAINTEXT protocol port will be treated as ANONYMOUS, if we still want PLAINTEXT port to be accesible , we need to allow user ANONYMOUS to have access from Ranger.</p><blockquote><p>manually add a user called “ANONYMOUS” in Ranger and apply corresponding access to this user in policy.</p></blockquote>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
            <tag> SASL </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka and Kafka Client communication protected by SSL</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Kafka_SSL/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Kafka_SSL/</url>
      
        <content type="html"><![CDATA[<h1 id="enable-ssl-for-kafka-and-kafka-client-communication"><a class="markdownIt-Anchor" href="#enable-ssl-for-kafka-and-kafka-client-communication"></a> Enable SSL for Kafka and Kafka Client Communication</h1><h2 id="scripts-self-signed-certificates-and-keystores-and-truststores"><a class="markdownIt-Anchor" href="#scripts-self-signed-certificates-and-keystores-and-truststores"></a> Scripts self-signed certificates and keystores and truststores</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -validity 365 -genkey -dname "CN=broker, OU=kafka" -keypass SuperTrust11 -storepass SuperTrust11</span><br><span class="line">openssl req -new -x509 -keyout ca-key -out ca-cert -days 365 -passout pass:"SuperTrust11" -subj "/C=AU/ST=WA/L=Perth/O=kafka/CN=broker"</span><br><span class="line"></span><br><span class="line">keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert -storepass SuperTrust11</span><br><span class="line">keytool -keystore kafka.client.truststore.jks -alias CARoot -import -file ca-cert -storepass SuperTrust11</span><br><span class="line"></span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -certreq -file cert-file</span><br><span class="line">openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365 -CAcreateserial -passin pass:SuperTrust11</span><br><span class="line"></span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias CARoot -import -file ca-cert -storepass SuperTrust11</span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -import -file cert-signed -storepass SuperTrust11</span><br><span class="line"></span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -validity 365 -genkey -dname "CN=client, OU=kafka" -keypass SuperTrust11 -storepass SuperTrust11</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -certreq -file client-cert-file -storepass SuperTrust11</span><br><span class="line">openssl x509 -req -CA ca-cert -CAkey ca-key -in client-cert-file -out client-cert-signed -days 365 -CAcreateserial -passin pass:SuperTrust11</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias CARoot -import -file ca-cert -storepass SuperTrust11</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -import -file client-cert-signed -storepass SuperTrust11</span><br><span class="line"></span><br><span class="line">mkdir -p /etc/security/certificates/kafka</span><br><span class="line">cp /home/company.net/rachel/cert-kafka/kafka.server.*.jks /etc/security/certificates/kafka</span><br><span class="line">chown -R kafka:kafka /etc/security/certificates/kafka</span><br><span class="line">ls -l /etc/security/certificates/kafka</span><br><span class="line"></span><br><span class="line">mkdir -p /etc/security/certificates/kafkaClient</span><br><span class="line">cp /home/company.net/rachel/cert-kafka/kafka.client.*.jks /etc/security/certificates/kafkaClient</span><br><span class="line">chown -R kafka:kafka /etc/security/certificates/kafkaClient</span><br><span class="line">ls -l /etc/security/certificates/kafkaClient</span><br></pre></td></tr></table></figure><p>Add below configuration to Kafka Config</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">ssl.keystore.location = /etc/security/certificates/kafka/kafka.server.keystore.jks</span><br><span class="line">ssl.keystore.password = SuperTrust11</span><br><span class="line">ssl.key.password = SuperTrust11</span><br><span class="line">ssl.truststore.location = /etc/security/certificates/kafka/kafka.server.truststore.jks</span><br><span class="line">ssl.truststore.password = SuperTrust11</span><br></pre></td></tr></table></figure><h2 id="test-one-way-ssl-default"><a class="markdownIt-Anchor" href="#test-one-way-ssl-default"></a> Test one way SSL (Default)</h2><h3 id="java-client-connect-to-kafka-via-ssl"><a class="markdownIt-Anchor" href="#java-client-connect-to-kafka-via-ssl"></a> Java Client connect to Kafka via SSL</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//configure the following three settings for SSL Encryption</span></span><br><span class="line">       props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, <span class="string">"SSL"</span>);</span><br><span class="line">       props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, <span class="string">"/opt/certificates/kafka/kafka.client.truststore.jks"</span>);</span><br><span class="line">       props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  <span class="string">"SuperTrust11"</span>);</span><br><span class="line"></span><br><span class="line">       <span class="comment">// configure the following three settings for SSL Authentication</span></span><br><span class="line">       props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, <span class="string">"/opt/certificates/kafka/kafka.client.keystore.jks"</span>);</span><br><span class="line">       props.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, <span class="string">"SuperTrust11"</span>);</span><br><span class="line">       props.put(SslConfigs.SSL_KEY_PASSWORD_CONFIG, <span class="string">"SuperTrust11"</span>);</span><br></pre></td></tr></table></figure><h2 id="command-line-client-connect-to-kafka-server-via-ssl"><a class="markdownIt-Anchor" href="#command-line-client-connect-to-kafka-server-via-ssl"></a> Command Line Client connect to Kafka server via SSL</h2><p><a href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/ch_wire-kafka.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/ch_wire-kafka.html</a></p><p>Change the producer.properties and consumer.properties file based on default under /usr/hdf/current/kafka-broker/conf, add below lines to each of the file,</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">security.protocol=SSL</span><br><span class="line">ssl.truststore.location=/etc/security/certificates/kafkaClient/kafka.client.truststore.jks</span><br><span class="line">ssl.truststore.password=SuperTrust11</span><br><span class="line"></span><br><span class="line"><span class="meta">#</span><span class="bash">currently not using keystore,??? how to specify client need authentication</span></span><br><span class="line"><span class="meta">#</span><span class="bash">ssl.keystore.location=/etc/security/certificates/kafkaClient/kafka.client.keystore.jks</span></span><br><span class="line"><span class="meta">#</span><span class="bash">ssl.keystore.password=SuperTrust11</span></span><br><span class="line"><span class="meta">#</span><span class="bash">ssl.key.password=SuperTrust11</span></span><br></pre></td></tr></table></figure><p>Then trigger the producer using,</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-console-producer.sh --broker-list broker1:6668,broker2:6668 --topic testtopic --producer.config /usr/hdf/current/kafka-broker/conf/producer.properties --security-protocol SSL</span><br></pre></td></tr></table></figure><p>Trigger the Consumer using,</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">/usr/hdf/current/kafka-broker/bin/kafka-console-consumer.sh \</span><br><span class="line">--bootstrap-server broker1:6668,broker2:6668 \</span><br><span class="line">--topic testtopic \</span><br><span class="line">--from-beginning --new-consumer --security-protocol SSL \</span><br><span class="line">--consumer.config /usr/hdf/current/kafka-broker/conf/consumer.properties</span><br></pre></td></tr></table></figure><h2 id="enable-two-way-ssl"><a class="markdownIt-Anchor" href="#enable-two-way-ssl"></a> Enable Two way ssl</h2><p>If we want to enable two way SSL then,</p><ol><li>“ssl.client.auth=required” should be added to the broker setting.</li><li>Server should already import client cert (already done when generating the keystore and truststores)</li></ol><h3 id="connect-from-command-line"><a class="markdownIt-Anchor" href="#connect-from-command-line"></a> Connect from command line</h3><p>Add both keystore and trust store to the producer and consumer property files.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">security.protocol=SSL</span><br><span class="line">ssl.truststore.location=/etc/security/certificates/kafkaClient/kafka.client.truststore.jks</span><br><span class="line">ssl.truststore.password=SuperTrust11</span><br><span class="line"></span><br><span class="line">ssl.keystore.location=/etc/security/certificates/kafkaClient/kafka.client.keystore.jks</span><br><span class="line">ssl.keystore.password=SuperTrust11</span><br><span class="line">ssl.key.password=SuperTrust11</span><br></pre></td></tr></table></figure><p>And used the same command line with one way SSL, we can consume and produce messages without issue.</p>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> SSL </tag>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Nifi UI Authentication integrated with LDAP</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/NifiCanvas_AD_LDAPs/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/NifiCanvas_AD_LDAPs/</url>
      
        <content type="html"><![CDATA[<h1 id="configure-nifi-using-ldap-to-do-authentication-to-web-ui"><a class="markdownIt-Anchor" href="#configure-nifi-using-ldap-to-do-authentication-to-web-ui"></a> Configure Nifi using Ldap to do Authentication to Web UI</h1><h2 id="configure-from-ambari-console"><a class="markdownIt-Anchor" href="#configure-from-ambari-console"></a> Configure from Ambari Console</h2><p>Ambari -&gt; Nifi -&gt; Configs -&gt; Advanced nifi-login-identity-providers-env</p><h3 id="using-ldap"><a class="markdownIt-Anchor" href="#using-ldap"></a> using LDAP</h3><ul><li>if use LDAP the truststore and keystore can just use server’s keystore and truststore with no furthur configuration</li></ul><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">loginIdentityProviders</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">provider</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">identifier</span>&gt;</span>ldap-provider<span class="tag">&lt;/<span class="name">identifier</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">class</span>&gt;</span>org.apache.nifi.ldap.LdapProvider<span class="tag">&lt;/<span class="name">class</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Authentication Strategy"</span>&gt;</span>SIMPLE<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Manager DN"</span>&gt;</span>CN=TheUserUsedToConnectToAD,OU=IT Accounts,OU=Iron Ore,DC=hortonworks,DC=net<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Manager Password"</span>&gt;</span>PasswordForTheUserUsedToConnectToAD<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore"</span>&gt;</span>/etc/security/certificates/nifi/nifi.server.keystore.jks<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore Password"</span>&gt;</span>KeyStorePass<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore Type"</span>&gt;</span>JKS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore"</span>&gt;</span>/etc/security/certificates/nifi/nifi.server.truststore.jks<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore Password"</span>&gt;</span>TrustStorePass<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore Type"</span>&gt;</span>JKS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Client Auth"</span>&gt;</span><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Protocol"</span>&gt;</span>TLS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Shutdown Gracefully"</span>&gt;</span><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Referral Strategy"</span>&gt;</span>FOLLOW<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Connect Timeout"</span>&gt;</span>10 secs<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Read Timeout"</span>&gt;</span>10 secs<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Url"</span>&gt;</span>ldap://ldap.ent.hortonworks.net:389<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"User Search Base"</span>&gt;</span>OU=Iron Ore,DC=hortonworks,DC=net<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"User Search Filter"</span>&gt;</span>sAMAccountName=&#123;0&#125;<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Identity Strategy"</span>&gt;</span>USE_USERNAME<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Authentication Expiration"</span>&gt;</span>12 hours<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">provider</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">loginIdentityProviders</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="using-ldaps"><a class="markdownIt-Anchor" href="#using-ldaps"></a> using LDAPs</h3><ul><li>if use LDAPs, we should import AD’s certificates into server’s truststore</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">[root@nifi-server01 ~]# /usr/lib/jvm/java-1.8.0-oracle/bin/keytool -import -file /usr/hdf/current/ranger-usersync/conf/symantec-intermediate-ca.cer -alias symantec-intermediate-ca -keystore /etc/security/certificates/nifi/nifi.server.truststore.jks</span><br><span class="line">[root@nifi-server01 ~]# /usr/lib/jvm/java-1.8.0-oracle/bin/keytool -import -file /usr/hdf/current/ranger-usersync/conf/symantec-root-ca.cer -alias symantec-root-ca -keystore /etc/security/certificates/nifi/nifi.server.truststore.jks</span><br></pre></td></tr></table></figure><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">loginIdentityProviders</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">provider</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">identifier</span>&gt;</span>ldap-provider<span class="tag">&lt;/<span class="name">identifier</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">class</span>&gt;</span>org.apache.nifi.ldap.LdapProvider<span class="tag">&lt;/<span class="name">class</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Authentication Strategy"</span>&gt;</span>LDAPS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Manager DN"</span>&gt;</span>CN=TheUserUsedToConnectToAD,OU=IT Accounts,OU=Iron Ore,DC=hortonworks,DC=net<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Manager Password"</span>&gt;</span>PasswordForTheUserUsedToConnectToAD<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore"</span>&gt;</span>/etc/security/certificates/nifi/nifi.server.keystore.jks<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore Password"</span>&gt;</span>KeyStorePass<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Keystore Type"</span>&gt;</span>JKS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore"</span>&gt;</span>/etc/security/certificates/nifi/nifi.server.truststore.jks<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore Password"</span>&gt;</span>TrustStorePass<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Truststore Type"</span>&gt;</span>JKS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Client Auth"</span>&gt;</span><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Protocol"</span>&gt;</span>TLS<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"TLS - Shutdown Gracefully"</span>&gt;</span><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Referral Strategy"</span>&gt;</span>FOLLOW<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Connect Timeout"</span>&gt;</span>10 secs<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Read Timeout"</span>&gt;</span>10 secs<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Url"</span>&gt;</span>ldaps://ldaps.ent.hortonworks.net:636<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"User Search Base"</span>&gt;</span>OU=Iron Ore,DC=hortonworks,DC=net<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"User Search Filter"</span>&gt;</span>sAMAccountName=&#123;0&#125;<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Identity Strategy"</span>&gt;</span>USE_USERNAME<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span> <span class="attr">name</span>=<span class="string">"Authentication Expiration"</span>&gt;</span>12 hours<span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">provider</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">loginIdentityProviders</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="reference-links"><a class="markdownIt-Anchor" href="#reference-links"></a> Reference Links</h2><p><a href="https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap/" target="_blank" rel="noopener">https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap/</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> hortonworks </tag>
            
            <tag> nifi </tag>
            
            <tag> ldap </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ranger Sync user and group information from LDAP to used in Authentication</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ranger_AD_LDAPs/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ranger_AD_LDAPs/</url>
      
        <content type="html"><![CDATA[<h1 id="ranger-configuration"><a class="markdownIt-Anchor" href="#ranger-configuration"></a> Ranger configuration</h1><h2 id="configure-ranger-to-sync-usergroup-from-ldap"><a class="markdownIt-Anchor" href="#configure-ranger-to-sync-usergroup-from-ldap"></a> Configure Ranger to sync user/group from LDAP</h2><h3 id="connection-parameters"><a class="markdownIt-Anchor" href="#connection-parameters"></a> connection Parameters</h3><ul><li><p><strong>LDAP Url</strong></p><ul><li>ldaps://ldaps.hortonworks.net:636</li></ul></li><li><p><strong>Binding User (sample of distinguished name)</strong></p><ul><li>CN=ADMINUSERTOPULLUSER,OU=IT Accounts,DC=hortonworks,DC=net</li></ul></li><li><p><strong>Parameters for user sync configuration</strong></p></li><li><p>User Attribute: sAMAccountName</p></li><li><p>User Object Class: person</p></li><li><p>User Search Base: OU=IT Accounts,DC=hortonworks,DC=net</p></li><li><p>User Search Filter: cn=*</p></li><li><p>User Search Scope: sub</p></li><li><p>User Group Name Attribute: memberof</p></li><li><p><strong>Parameters for group sync configuration</strong></p></li><li><p>Group Member Attribute: member</p></li><li><p>Group Name Attribute: cn</p></li><li><p>Group Object Name: group</p></li><li><p>Group Search Base: DC=hortonworks,DC=net</p></li><li><p>Group Search Filter: cn=*</p></li></ul><h3 id="ranger-truststore-configuration"><a class="markdownIt-Anchor" href="#ranger-truststore-configuration"></a> Ranger Truststore configuration</h3><p>As we are using LDAPS, we need to import the AD’s certificate into Ranger’s Truststore.</p><p>Check below configuration from Ambari admin console,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ranger.usersync.truststore.file</span><br><span class="line">ranger.usersync.truststore.password</span><br></pre></td></tr></table></figure><p>Make sure Truststore file exist and password is correct.</p><p>If you have existing trust store file, you can import the certification manually if needed.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">[root@rangerServer01 ~]# /usr/lib/jvm/java-1.8.0-oracle/bin/keytool -import -file /usr/hdf/current/ranger-usersync/conf/symantec-intermediate-ca.cer -alias symantec-intermediate-ca -keystore /usr/hdf/current/ranger-usersync/conf/mytruststore.jks</span><br><span class="line">[root@rangerServer01 ~]# /usr/lib/jvm/java-1.8.0-oracle/bin/keytool -import -file /usr/hdf/current/ranger-usersync/conf/symantec-root-ca.cer -alias symantec-root-ca -keystore /usr/hdf/current/ranger-usersync/conf/mytruststore.jks</span><br></pre></td></tr></table></figure><p>Restart Ranger, and check the ranger user sync log at /var/log/ranger/usersync/usersync.log</p><p>Login Ranger to check the users and groups are successfully syncronized.</p>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> hortonworks </tag>
            
            <tag> ranger </tag>
            
            <tag> ldap </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>enable SSL for Ranger Web UI</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ranger_UI_HTTPS/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Security/Ranger_UI_HTTPS/</url>
      
        <content type="html"><![CDATA[<p>Turn on Ranger Admin UI HTTPs using self-signed certificate</p><p><a href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/configure_ambari_ranger_ssl_self_signed_cert_admin.html" target="_blank" rel="noopener">https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/configure_ambari_ranger_ssl_self_signed_cert_admin.html</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">[root@ranger-server01 conf]# keytool -genkey -keyalg RSA -alias rangeradmin -keystore ranger-admin-keystore.jks -storepass xasecure -validity 360 -keysize 2048</span><br><span class="line">What is your first and last name?</span><br><span class="line">  [Unknown]:  ranger-server01.hortonworks.net</span><br><span class="line">What is the name of your organizational unit?</span><br><span class="line">  [Unknown]:  IT</span><br><span class="line">What is the name of your organization?</span><br><span class="line">  [Unknown]:  HORTONWORKS</span><br><span class="line">What is the name of your City or Locality?</span><br><span class="line">  [Unknown]:  Perth</span><br><span class="line">What is the name of your State or Province?</span><br><span class="line">  [Unknown]:  WA</span><br><span class="line">What is the two-letter country code for this unit?</span><br><span class="line">  [Unknown]:  61</span><br><span class="line">Is CN=ranger-server01.hortonworks.net, OU=IT, O=HORTONWORKS, L=Perth, ST=WA, C=61 correct?</span><br><span class="line">  [no]:  yes</span><br><span class="line"></span><br><span class="line">Enter key password for &lt;rangeradmin&gt;</span><br><span class="line">        (RETURN if same as keystore password):</span><br></pre></td></tr></table></figure><h2 id="reset-the-environment-for-ranger"><a class="markdownIt-Anchor" href="#reset-the-environment-for-ranger"></a> Reset the environment for Ranger</h2><p>If ranger configuration is wrong, we can</p><ol><li>backup the ranger database</li><li>stop ranger</li><li>drop the database, recreate, give corresponding PRIVILEGES</li><li>start ranger<br>The table will be automatically recreated and configured.</li></ol>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> SSL </tag>
            
            <tag> hortonworks </tag>
            
            <tag> ranger </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Nifi Data migration</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Nifi/Nifi-Migrating/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Nifi/Nifi-Migrating/</url>
      
        <content type="html"><![CDATA[<h1 id="nifi-data-migration"><a class="markdownIt-Anchor" href="#nifi-data-migration"></a> Nifi Data migration</h1><p>Requirement, migrate existing Nifi flow into clustered environment without data loss.</p><p><a href="https://community.hortonworks.com/questions/63745/migrating-nifi-flow-files-between-servers.html" target="_blank" rel="noopener">https://community.hortonworks.com/questions/63745/migrating-nifi-flow-files-between-servers.html</a></p><h1 id="nifi-dump"><a class="markdownIt-Anchor" href="#nifi-dump"></a> Nifi Dump</h1><p><a href="http://nifiDump.sh" target="_blank" rel="noopener">nifiDump.sh</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> get 10 thread dumps</span></span><br><span class="line">for i in &#123;1..10&#125;</span><br><span class="line">do</span><br><span class="line">  echo "start dump"</span><br><span class="line">  /usr/hdf/current/nifi/bin/nifi.sh dump /tmp/nifi_Dump_$(date +"%Y_%m_%d_%I_%M_%p")</span><br><span class="line">  echo "finished, sleep 60s"</span><br><span class="line">  sleep 60s</span><br><span class="line">done</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> nifi </tag>
            
            <tag> migration </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Kafka Security</title>
      <link href="2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/Kafka-Security/"/>
      <url>2017/07/10/markdown/TechByVendorName/hortonworks/Kafka/Kafka-Security/</url>
      
        <content type="html"><![CDATA[<h1 id="kafka-security"><a class="markdownIt-Anchor" href="#kafka-security"></a> kafka Security</h1><p>Reference Link</p><p><a href="https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/" target="_blank" rel="noopener">https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/</a></p><blockquote><p>For broker/ZooKeeper communication, we will only require Kerberos authentication as TLS is only supported in ZooKeeper 3.5, which is still at the alpha release stage.</p></blockquote><p><strong>Note</strong> current hortonworks zookeeper version in HDF3.0 is ZooKeeper 3.4.6.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">#!/bin/bash</span><br><span class="line">PASSWORD=test1234</span><br><span class="line">VALIDITY=365</span><br><span class="line"># generate keystore for localhost ; valid for 365 days</span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -validity $VALIDITY -genkey</span><br><span class="line"># generate client keystore ; valid for 365 days</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -validity $VALIDITY -genkey</span><br><span class="line"></span><br><span class="line">####### generate key store for server and client; valid for 365 days ######</span><br><span class="line"></span><br><span class="line"># generate new X509 certificate with cert and keys ; valid for 365 days</span><br><span class="line">openssl req -new -x509 -keyout ca-key -out ca-cert -days $VALIDITY</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">####### generate x509 CA certificate ; valid for 365 days #####</span><br><span class="line"></span><br><span class="line"># generate truststore for server, trust ca-cert as CARoot</span><br><span class="line">keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert</span><br><span class="line"># generate truststore for client, trust ca-cert as CARoot</span><br><span class="line">keytool -keystore kafka.client.truststore.jks -alias CARoot -import -file ca-cert</span><br><span class="line"></span><br><span class="line">###### both server and client trust ca-cert by import the ca-cert into truststore #####</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"># generate a cert-file for server</span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -certreq -file cert-file</span><br><span class="line"># using ca-cert and password to sign server&apos;s cert-file ; valid for 365 days</span><br><span class="line">openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days $VALIDITY -CAcreateserial -passin pass:$PASSWORD</span><br><span class="line"># import ca-cert as CARoot into server&apos;s keystore</span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias CARoot -import -file ca-cert</span><br><span class="line"># import ca signed cert-signed into server&apos;s keystore</span><br><span class="line">keytool -keystore kafka.server.keystore.jks -alias localhost -import -file cert-signed</span><br><span class="line"></span><br><span class="line">###### using ca credential to sign certificate for server; import both ca-cert and ca-signed certificate into server keystore  ######</span><br><span class="line"></span><br><span class="line"># generate a cert-file for client</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -certreq -file cert-file</span><br><span class="line"># using ca-cert and password to sign client&apos;s cert-file; valid for 365 days</span><br><span class="line">openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days $VALIDITY -CAcreateserial -passin pass:$PASSWORD</span><br><span class="line"># import ca-cert as CARoot into client&apos;s keystore</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias CARoot -import -file ca-cert</span><br><span class="line"># import ca signed cert-signed into client&apos;s keystore</span><br><span class="line">keytool -keystore kafka.client.keystore.jks -alias localhost -import -file cert-signed</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">###### using ca credential to sign certificate for client; import both ca-cert and ca-signed certificate into server keystore  ######</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> security </tag>
            
            <tag> hortonworks </tag>
            
            <tag> kafka </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ambari SNMP Alert Setting</title>
      <link href="2017/07/05/markdown/TechByVendorName/hortonworks/Ambari/AmbariSnmpTesting/"/>
      <url>2017/07/05/markdown/TechByVendorName/hortonworks/Ambari/AmbariSnmpTesting/</url>
      
        <content type="html"><![CDATA[<h1 id="ambari-snmp-alert-setting"><a class="markdownIt-Anchor" href="#ambari-snmp-alert-setting"></a> Ambari SNMP Alert Setting</h1><h1 id="test-environment"><a class="markdownIt-Anchor" href="#test-environment"></a> Test Environment</h1><p>Description</p><h2 id="prepare-the-snmp-test-server"><a class="markdownIt-Anchor" href="#prepare-the-snmp-test-server"></a> Prepare the SNMP Test Server</h2><ul><li>install snmp</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">yum install net-snmp net-snmp-utils net-snmp-libs –y</span><br></pre></td></tr></table></figure><ul><li>change authorization</li></ul> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"># authCommunity   log,execute,net public</span><br><span class="line"># traphandle SNMPv2-MIB::coldStart    /usr/bin/bin/my_great_script cold</span><br><span class="line">disableAuthorization yes</span><br></pre></td></tr></table></figure><ul><li>add Ambari MIB definition<br>The current version of Ambari (2.4.2) does not contain MIB definition file. Manually copy the content from ambari jira<br><a href="https://issues.apache.org/jira/secure/attachment/12761892/APACHE-AMBARI-MIB.txt" target="_blank" rel="noopener">https://issues.apache.org/jira/secure/attachment/12761892/APACHE-AMBARI-MIB.txt</a></li></ul> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/share/snmp/mibs/APACHE-AMBARI-MIB.txt</span><br><span class="line">chmod 777 /usr/share/snmp/mibs/APACHE-AMBARI-MIB.txt</span><br></pre></td></tr></table></figure><ul><li>Start the SNMP trap daemon to log all traps to /tmp/traps.log for testing purpose</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nohup snmptrapd -m ALL -A -n -Lf /tmp/traps.log &amp;</span><br></pre></td></tr></table></figure><ul><li>test the Ambari MIB is being respected by SNMP Server</li></ul><p>From the SNMP server, run below command</p> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">snmptrap -v 2c -c public localhost '' APACHE-AMBARI-MIB::apacheAmbariAlert alertDefinitionName s "definitionName" alertDefinitionHash s "definitionHash" alertName s "name" alertText s "text" alertState i 0 alertHost s "host" alertService s "service" alertComponent s "component"</span><br></pre></td></tr></table></figure><p>Then check the notification/trap is loged in /tmp/traps.log</p> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"> 2017-07-05 12:04:17 UDP: [127.0.0.1]:48795-&gt;[127.0.0.1]:162 [UDP: [127.0.0.1]:48795-&gt;[127.0.0.1]:162]:</span><br><span class="line">DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (379025195) 43 days, 20:50:51.95       SNMPv2-MIB::snmpTrapOID.0 = OID: APACHE-AMBARI-MIB::apacheAmbariAlert   APACHE-AMBARI-MIB::alertDefinitionName = STRING: &quot;definitionName&quot;       APACHE-AMBARI-MIB::alertDefinitionHash = STRING: &quot;definitionHash&quot;       APACHE-AMBARI-MIB::alertName = STRING: &quot;name&quot;   APACHE-AMBARI-MIB::alertText = STRING: &quot;text&quot;   APACHE-AMBARI-MIB::alertState = INTEGER: ok(0)  APACHE-AMBARI-MIB::alertHost = STRING: &quot;host&quot;   APACHE-AMBARI-MIB::alertService = STRING: &quot;service&quot;     APACHE-AMBARI-MIB::alertComponent = STRING: &quot;component&quot;</span><br></pre></td></tr></table></figure><h2 id="define-notification-from-ambari-server"><a class="markdownIt-Anchor" href="#define-notification-from-ambari-server"></a> Define notification from Ambari Server</h2><p>From Menu <strong>&quot;Alerts-&gt;Actions-&gt;Manage Notifications&quot;</strong></p><p>![DefineAmbariSNMPNotification.PNG]</p><h2 id="test-ambari-cluster-notification-can-be-sent-out"><a class="markdownIt-Anchor" href="#test-ambari-cluster-notification-can-be-sent-out"></a> Test Ambari Cluster notification can be sent out</h2><p>Stop Kafka Broker and check the SNMP Server /tmp/traps.log</p> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">2017-07-05 10:47:48 UDP: [10.241.212.35]:39631-&gt;[10.241.86.87]:162 [UDP: [10.241.222.35]:39631-&gt;[10.241.86.87]:162]:</span><br><span class="line">SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::org.5.1.3.1.1232.1 SNMPv2-SMI::org.5.1.3.1.1232.1 = STRING: &quot;</span><br><span class="line"></span><br><span class="line">[Alert] Kafka Broker Process</span><br><span class="line">[Service] KAFKA</span><br><span class="line">[Component] KAFKA_BROKER</span><br><span class="line">[Host] KafkaHost1</span><br><span class="line"></span><br><span class="line">Connection failed: [Errno 111] Connection refused to KafkaHost1:6667</span><br><span class="line">    &quot;   SNMPv2-SMI::org.5.1.3.1.1232.1 = STRING: &quot;</span><br><span class="line">      [CRITICAL] Kafka Broker Process</span><br><span class="line">    &quot;</span><br></pre></td></tr></table></figure><h2 id="reference-url"><a class="markdownIt-Anchor" href="#reference-url"></a> Reference Url</h2><p><a href="https://community.hortonworks.com/articles/74370/snmp-alert.html" target="_blank" rel="noopener">https://community.hortonworks.com/articles/74370/snmp-alert.html</a></p><p><a href="https://github.com/apache/ambari/tree/trunk/contrib/alert-snmp-mib" target="_blank" rel="noopener">https://github.com/apache/ambari/tree/trunk/contrib/alert-snmp-mib</a></p><!--images git[DefineAmbariSNMPNotification.PNG]:images/AmbariSnmpTesting/DefineAmbariSNMPNotification.PNG-->]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> ambari </tag>
            
            <tag> snmp </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Nifi Kafka Cluster Test Cases</title>
      <link href="2017/06/15/markdown/TechByVendorName/hortonworks/ClusterTesting/Nifi_Kafka_ClusterTest/"/>
      <url>2017/06/15/markdown/TechByVendorName/hortonworks/ClusterTesting/Nifi_Kafka_ClusterTest/</url>
      
        <content type="html"><![CDATA[<h1 id="nifi-kafka-cluster-test-cases"><a class="markdownIt-Anchor" href="#nifi-kafka-cluster-test-cases"></a> Nifi Kafka Cluster Test Cases</h1><h2 id="topology"><a class="markdownIt-Anchor" href="#topology"></a> Topology</h2><p>Due to environment limitation, the test environment are two Linux servers creating an Ambari cluster with,</p><ul><li>A zookeeper cluster shared by nifi and Kafka cluster,</li><li>A nifi cluster with two nodes,</li><li>A Kafka cluster</li></ul><h2 id="message-sequencing-tests"><a class="markdownIt-Anchor" href="#message-sequencing-tests"></a> Message Sequencing Tests</h2><h3 id="message-sequencing-test-topic-without-partition"><a class="markdownIt-Anchor" href="#message-sequencing-test-topic-without-partition"></a> Message Sequencing Test – topic without partition</h3><h4 id="test0-basic-scenario-subscribe-process-kafka-messages-from-nifi-cluster-single-threading"><a class="markdownIt-Anchor" href="#test0-basic-scenario-subscribe-process-kafka-messages-from-nifi-cluster-single-threading"></a> Test0, Basic scenario, Subscribe &amp; Process Kafka messages from Nifi Cluster (Single Threading)</h4><p><img src="https://user-images.githubusercontent.com/5424421/27468239-4aae6112-581b-11e7-9dd2-7ecdd11b378c.png" alt="image"></p><ul><li>Pre-Step, put sequentially 10000 messages in TopicA and 10000 messages in TopicB<ul><li>both topic with 2 replicas, no partition</li><li>messages are putting into the queue with sequence, (like Msg1, Msg2…)</li></ul></li><li>Test Step, Start Nifi Flows<ul><li>one flow subscribe TopicA and put subscribed message into TopicA.D</li><li>one flow subscribe TopicB and put subscribed message into TopicB.D</li><li>all nifi processors running on Nifi cluster and configured as <strong>“Concurrent Tasks =1” and Execution = &quot;Primary node&quot;</strong><br><img src="https://user-images.githubusercontent.com/5424421/27468280-90249414-581b-11e7-8193-aa19336f44c1.png" alt="image"></li><li>subscribe processor is set to consume from “earliest” and with unique client Id.</li></ul></li><li>After the step, read messages from TopicA.D and TopicB.D, verify that messages are in sequence.</li></ul><p><strong>Conclusion</strong>: messages still remained the sequence, although the Nifi have 2 instances.</p><h4 id="test-1-huge-volume-50k-of-messages-passing-through"><a class="markdownIt-Anchor" href="#test-1-huge-volume-50k-of-messages-passing-through"></a> Test 1, huge volume (50k) of messages passing through</h4><p>Similar as above, but change the volume of the message to 50k, passed.</p><h4 id="test-2-subscribe-process-kafka-messages-from-nifi-cluster-multi-thread-allowed"><a class="markdownIt-Anchor" href="#test-2-subscribe-process-kafka-messages-from-nifi-cluster-multi-thread-allowed"></a> Test 2, Subscribe &amp; Process Kafka messages from Nifi Cluster (multi-thread allowed)</h4><ul><li>Pre-Step, put sequentially 20000 messages in TopicA<ul><li>topic with 2 replicas, no partition</li><li>messages are putting into the queue with sequence, (like Msg1, Msg2…)</li></ul></li><li>Test Step, Start Nifi Flows<ul><li>the flow subscribe TopicA and put subscribed message into TopicA.D</li><li>all nifi processors running on Nifi cluster and configured as <strong>“Concurrent Tasks =1” and Execution = &quot;All nodes&quot;</strong></li><li>subscribe processor is set to consume from “earliest” and with unique client Id.</li></ul></li><li>After the step, read messages from TopicA.D, verify that messages are in sequence.</li></ul><p><strong>Conclusion</strong>: messages still remained the sequence, although the Nifi have 2 instances and processors are allowed to run on both nodes.</p><h4 id="test-3-subscribe-process-kafka-messages-from-nifi-cluster-failover"><a class="markdownIt-Anchor" href="#test-3-subscribe-process-kafka-messages-from-nifi-cluster-failover"></a> Test 3, Subscribe &amp; Process Kafka messages from Nifi Cluster (failover)</h4><ul><li>Pre-Step, put sequentially 40000 messages in TopicA<ul><li>topic with 2 replicas, no partition</li><li>messages are putting into the queue with sequence, (like Msg1, Msg2…)</li><li>check Nifi Primary Node host name and instance process id</li></ul></li><li>Test Step, Start Nifi Flows, <strong>in middle of the processing kill the primary nifi instance</strong><ul><li>the flow subscribe TopicA and put subscribed message into TopicA.D</li><li>all nifi processors running on Nifi cluster and configured as <strong>“Concurrent Tasks =1” and Execution = &quot;Primary nodes&quot;</strong></li><li>subscribe processor is set to consume from “earliest” and with unique client Id.</li></ul></li><li>After the step, read messages from TopicA.D, although the nifi instance is killed in the middle of message processing, the message the message sequence.</li></ul><h4 id="test-3-retest-increase-message-volume-to-50k-failover"><a class="markdownIt-Anchor" href="#test-3-retest-increase-message-volume-to-50k-failover"></a> Test 3 retest, increase message volume to 50k (failover)</h4><p>have an issue with message processing, but after changing the message publisher to correct version (Kafka 0.10), problem solved.</p><p><strong>Conclusion</strong>: messages still remained the sequence, batching &amp; transaction works to maintain message sequence.</p><h3 id="message-sequencing-tests-series-kafka-topics-with-partition"><a class="markdownIt-Anchor" href="#message-sequencing-tests-series-kafka-topics-with-partition"></a> Message Sequencing Tests Series  — Kafka Topics With Partition</h3><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ kafka-topics.sh --create --zookeeper &#123;zServer1:2181,zServer2:2181&#125; --replication-factor 2 --partitions 2 --topic TopicA</span><br></pre></td></tr></table></figure><p><img src="https://user-images.githubusercontent.com/5424421/27468308-bb0251e4-581b-11e7-8ce0-9f96ce0e4759.png" alt="image"></p><p>Assign different keys while sending out the message, and check the behaviour of partition assignment.</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">producer.send(<span class="keyword">new</span> ProducerRecord&lt;String, String&gt;(topic, <span class="string">"Avasdr"</span>, <span class="string">"Avasdr.Msg"</span> + i), <span class="keyword">new</span> Callback() &#123;</span><br><span class="line">                <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">onCompletion</span><span class="params">(RecordMetadata metadata, Exception e)</span> </span>&#123;</span><br><span class="line">                    <span class="keyword">if</span> (e != <span class="keyword">null</span>) &#123;</span><br><span class="line">                        e.printStackTrace();</span><br><span class="line">                    &#125;</span><br><span class="line">                    System.out.println(<span class="string">"key: "</span> + <span class="string">"Avasdr"</span> + <span class="string">", Partition: "</span> + metadata.partition());</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;);</span><br></pre></td></tr></table></figure><p><strong>Conclusion</strong> the message is sending out to different partitions using round-robin (with a group of 3) strategy.</p><ul><li>For example, messages with 3 different keys will be sent all to partition 0 ; messages with 5 different keys, the first 3 keys will be assigned to partition 0, and rest 2 keys to partition 1.</li></ul><p>Here’s an example of consuming message from specified partition from Java client,</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">TopicPartition partition = <span class="keyword">new</span> TopicPartition(topicName, PartionNum);</span><br><span class="line">consumer.assign(Arrays.asList(partition));</span><br><span class="line"><span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i&lt; numOfMsg;i++) &#123;</span><br><span class="line">    ConsumerRecords&lt;String, String&gt; records = consumer.poll(batch);            </span><br><span class="line">    <span class="keyword">for</span> (ConsumerRecord&lt;String, String&gt; record : records) &#123;</span><br><span class="line">        System.out.println(<span class="string">"message value,"</span> + record.value());</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="message-sequencing-conclusion"><a class="markdownIt-Anchor" href="#message-sequencing-conclusion"></a> Message Sequencing Conclusion</h2><p>The easiest way of maintaining message sequence as well as have the capability to parallel processing is to publish messages to Kafka Topic with the key.<br>Messages with the same key will always be sent to the same partition and one partition is guaranteed to be consumed by only one consumer from Nifi-Kafka Client (even if we allow muti-threading).</p><h2 id="reference-documents"><a class="markdownIt-Anchor" href="#reference-documents"></a> Reference Documents</h2><ul><li><a href="https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/" target="_blank" rel="noopener">https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/</a></li><li><a href="https://howtoprogram.xyz/2016/06/04/write-apache-kafka-custom-partitioner/" target="_blank" rel="noopener">https://howtoprogram.xyz/2016/06/04/write-apache-kafka-custom-partitioner/</a></li></ul><blockquote><p>Although it’s possible to increase the number of partitions over time, one has to be careful if messages are produced with keys. When publishing a keyed message, Kafka deterministically maps the message to a partition based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition. This guarantee can be important for certain applications since messages within a partition are always delivered in order to the consumer. If the number of partitions changes, such a guarantee may no longer hold. To avoid this situation, a common practice is to over-partition a bit.</p></blockquote><h2 id="site-to-site-protocol"><a class="markdownIt-Anchor" href="#site-to-site-protocol"></a> Site-to-site Protocol</h2><p><strong>Test Steps</strong></p><ul><li>prepare 50k messages in TopicA</li><li>subscribe messages from Nifi Cluster (concurrency=1, run on primary node), and then send to remote nifi via site-to-site protocol</li><li>remote nifi read the message and publish to TopicA.D</li><li>verify message sequence</li></ul><p><strong>Configuration</strong></p><ul><li>All kafka processors are using version 0.10</li><li>All publisher processors are setting to “Guarantee Replicated Delivery”</li><li>Set single threading and “FIFO”</li></ul><h3 id="site-to-site-push"><a class="markdownIt-Anchor" href="#site-to-site-push"></a> Site-to-site push</h3><h3 id="site-to-site-push-via-http"><a class="markdownIt-Anchor" href="#site-to-site-push-via-http"></a> Site-to-site push via HTTP</h3><h3 id="site-to-site-push-via-raw"><a class="markdownIt-Anchor" href="#site-to-site-push-via-raw"></a> Site-to-site push via RAW</h3><p>Passed.</p><p><img src="https://user-images.githubusercontent.com/5424421/27573921-ba90aa02-5b46-11e7-806f-cea9802621cb.png" alt="image"></p><h3 id="site-to-site-push-via-raw-failover"><a class="markdownIt-Anchor" href="#site-to-site-push-via-raw-failover"></a> Site-to-site push via RAW (failover)</h3><p>50k message<br>kill the primary nifi in middle of processing. Failed.<br>Result:   target topic get 50k messages. But sequence is wrong. Around 8k messages are received after Nifi instance started again.<br><img src="https://user-images.githubusercontent.com/5424421/27670978-452cf0bc-5cc3-11e7-89df-f412b1da2c41.png" alt="image"></p><h3 id="site-to-site-pull"><a class="markdownIt-Anchor" href="#site-to-site-pull"></a> Site-to-site pull</h3><h3 id="site-to-site-pull-via-http"><a class="markdownIt-Anchor" href="#site-to-site-pull-via-http"></a> Site-to-site pull via HTTP</h3><p>Passed.<br><img src="https://user-images.githubusercontent.com/5424421/27573894-a0af1484-5b46-11e7-83ee-5ff2bf5d4951.png" alt="image"></p><h3 id="site-to-site-pull-via-raw"><a class="markdownIt-Anchor" href="#site-to-site-pull-via-raw"></a> Site-to-site pull via RAW</h3>]]></content>
      
      
      
        <tags>
            
            <tag> cluster </tag>
            
            <tag> hortonworks </tag>
            
            <tag> nifi </tag>
            
            <tag> kafka </tag>
            
            <tag> message sequencing </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>common IOT scenario</title>
      <link href="2017/06/11/markdown/Trending/IOT/IOTWeek/"/>
      <url>2017/06/11/markdown/Trending/IOT/IOTWeek/</url>
      
        <content type="html"><![CDATA[<h1 id="common-iot-scenario"><a class="markdownIt-Anchor" href="#common-iot-scenario"></a> common IOT scenario</h1><p>device: sensor --&gt;micro controller</p><h1 id="common-iot-protocols"><a class="markdownIt-Anchor" href="#common-iot-protocols"></a> common IOT Protocols</h1><p>wifi , zigbee, mqtt, sigbox</p><h1 id="demo-1"><a class="markdownIt-Anchor" href="#demo-1"></a> Demo 1</h1><ol><li>Connect sensor with micro controller to connect to computer</li><li>upload the code into sensor , test when connected</li></ol><ul><li>sensor the height between the sensor to the ground (or blocking level)</li></ul><h1 id="demo-2"><a class="markdownIt-Anchor" href="#demo-2"></a> Demo 2</h1><p>Based on Demo 1,  send the data to cloud platform</p>]]></content>
      
      
      
        <tags>
            
            <tag> iot </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Big Data And Hadoop</title>
      <link href="2017/06/01/markdown/Trending/BigData/BigDataAndHadoop/"/>
      <url>2017/06/01/markdown/Trending/BigData/BigDataAndHadoop/</url>
      
        <content type="html"><![CDATA[<!-- TOC START min:1 max:3 link:true update:true --><ul><li><a href="#big-data-and-hadoop-for-beginners">Big data and hadoop for beginners</a></li><li><a href="#big-data-overview">Big data Overview</a><ul><li><a href="#big-data-use-case">big data use case</a></li><li><a href="#big-data-jobs">big data jobs</a></li><li><a href="#etl-vs-elt">ETL vs ELT</a></li><li><a href="#major-commercial-distributers-of-hadoop">Major Commercial Distributers of Hadoop</a></li></ul></li><li><a href="#hadoop-fundamental">Hadoop fundamental</a></li><li><a href="#hadoop-ecosystem">Hadoop Ecosystem</a></li><li><a href="#hadoop-distributed-components">Hadoop distributed components</a></li><li><a href="#hadoop-data-processing-blocks">Hadoop data processing blocks</a></li><li><a href="#name-node--secondary-name-node">Name Node / Secondary Name Node</a></li></ul><!-- TOC END --><h1 id="big-data-and-hadoop-for-beginners"><a class="markdownIt-Anchor" href="#big-data-and-hadoop-for-beginners"></a> Big data and hadoop for beginners</h1><p>This is based on,<br><a href="https://www.udemy.com/big-data-and-hadoop-for-beginners/learn/v4/overview" target="_blank" rel="noopener">https://www.udemy.com/big-data-and-hadoop-for-beginners/learn/v4/overview</a></p><h1 id="big-data-overview"><a class="markdownIt-Anchor" href="#big-data-overview"></a> Big data Overview</h1><ul><li>structured data (database, excel) , semi structured data (Xml , Json), unstructured data (log)</li><li>5 Vs of big data: Volume (Terabytes, ZettaBytes) ; Vilocity (the speed of data generating or moving around); Variety ( structured~unstructured); Veracity (data quality); Value (value etracted from data)</li></ul><h2 id="big-data-use-case"><a class="markdownIt-Anchor" href="#big-data-use-case"></a> big data use case</h2><ul><li>mobile advertisement company</li><li>telco, finiance, retailer</li></ul><h2 id="big-data-jobs"><a class="markdownIt-Anchor" href="#big-data-jobs"></a> big data jobs</h2><p>big data analyst; hadoop administrator ; big data engineer; big data scentist;<br>big data manager; big data solution archi ; chief data officer</p><h2 id="etl-vs-elt"><a class="markdownIt-Anchor" href="#etl-vs-elt"></a> ETL vs ELT</h2><ul><li>Traditional: ETL : Extract Transform Load --&gt;data warehouse</li><li>Hadoop: ELT: Extract -&gt; Load -&gt; Transform</li></ul><h2 id="major-commercial-distributers-of-hadoop"><a class="markdownIt-Anchor" href="#major-commercial-distributers-of-hadoop"></a> Major Commercial Distributers of Hadoop</h2><ul><li>Amazon Elastic Map Reduce (EMR)</li><li>Cloudera</li><li>Hortonworks</li><li>MapR Technology(support network files)</li><li>Pivotal</li><li>TereData</li></ul><h1 id="hadoop-fundamental"><a class="markdownIt-Anchor" href="#hadoop-fundamental"></a> Hadoop fundamental</h1><ul><li>HDFS</li><li>MapReduce<br>In hadoop, data is distributed and the network is used to transport the data processing method and processed result ,thus saved the effort of moving data.<br>A mapper is trying to find the correct data locally and process them and then send the result to reducer.</li></ul><h1 id="hadoop-ecosystem"><a class="markdownIt-Anchor" href="#hadoop-ecosystem"></a> Hadoop Ecosystem</h1><p><img src="https://cloud.githubusercontent.com/assets/5424421/26528807/983deb2c-43e5-11e7-9538-6953b5ca5f9a.png" alt="hadoopecosystem"></p><h1 id="image"><a class="markdownIt-Anchor" href="#image"></a> <img src="https://user-images.githubusercontent.com/5424421/27009414-260399c2-4ec0-11e7-9030-48d7be5852d7.png" alt="image"></h1><h1 id="hadoop-distributed-components"><a class="markdownIt-Anchor" href="#hadoop-distributed-components"></a> Hadoop distributed components</h1><ul><li>Components distributed (note who and who are on the same node)</li><li>Task Tracker act as slave to Job Tracker</li><li>Data Node demaen act as slave to Name Node</li><li>Name Node store meta data (where’s the data what data) , runs on master node.</li></ul><h1 id="hadoop-data-processing-blocks"><a class="markdownIt-Anchor" href="#hadoop-data-processing-blocks"></a> Hadoop data processing blocks</h1><ul><li>How Hadoop split big data and spread into different nodes</li><li>Unix System default data block 4k, in Hadoop default data block is 64MB.<br>Why is hadoop define the data block unit as 64MB ? To eliminate the network overhead when request and locate data.<br>By default , hadoop persist 3 copies for same data block.</li></ul><h1 id="name-node-secondary-name-node"><a class="markdownIt-Anchor" href="#name-node-secondary-name-node"></a> Name Node / Secondary Name Node</h1><p><img src="https://user-images.githubusercontent.com/5424421/27009460-d82b40b8-4ec1-11e7-98c9-2f756fc9dc97.png" alt="image"></p><ul><li><ol><li>Failover</li></ol></li><li><ol start="2"><li>Run at background to do the data sync between datalog and FS image that master node only do when start up.</li></ol></li></ul><!---images-->]]></content>
      
      
      
        <tags>
            
            <tag> bigdata </tag>
            
            <tag> hadoop </tag>
            
            <tag> trending </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Linux commands</title>
      <link href="2017/05/12/markdown/BackToBasic/Security/SymmetricVsAsymmetricEncryption/"/>
      <url>2017/05/12/markdown/BackToBasic/Security/SymmetricVsAsymmetricEncryption/</url>
      
        <content type="html"><![CDATA[<ul><li>Symmetric vs Asymmetric encryption</li></ul><p><a href="https://www.ssl2buy.com/wiki/symmetric-vs-asymmetric-encryption-what-are-differences" target="_blank" rel="noopener">https://www.ssl2buy.com/wiki/symmetric-vs-asymmetric-encryption-what-are-differences</a></p><ul><li>对称密钥的意思就是加密解密用一样的密钥</li><li>非对称加密就是公钥私钥交叉加密的方式</li></ul><h1 id="x509-certification"><a class="markdownIt-Anchor" href="#x509-certification"></a> x.509 certification</h1><p>An X.509 certificate contains a public key and an identity (a hostname, or an organization, or an individual), and is either signed by a certificate authority or self-signed.</p><p>The structure of an X.509 v3 digital certificate is as follows:</p><p>Certificate<br>Version Number<br>Serial Number<br>Signature Algorithm ID<br>Issuer Name<br>Validity period<br>Not Before<br>Not After<br>Subject name<br>Subject Public Key Info<br>Public Key Algorithm<br>Subject Public Key<br>Issuer Unique Identifier (optional)<br>Subject Unique Identifier (optional)<br>Extensions (optional)<br>…<br>Certificate Signature Algorithm<br>Certificate Signature</p><h2 id="certificate-filename-extensions"><a class="markdownIt-Anchor" href="#certificate-filename-extensions"></a> Certificate filename extensions</h2><p>There are several commonly used filename extensions for X.509 certificates. Unfortunately, some of these extensions are also used for other data such as private keys.</p><ul><li>.pem – (Privacy-enhanced Electronic Mail) Base64 encoded DER certificate, enclosed between “-----BEGIN CERTIFICATE-----” and “-----END CERTIFICATE-----”</li><li>.cer, .crt, .der – usually in binary DER form, but Base64-encoded certificates are common too (see .pem above)</li><li>.p7b, .p7c – PKCS#7 SignedData structure without data, just certificate(s) or CRL(s)</li><li>.p12 – PKCS#12, may contain certificate(s) (public) and private keys (password protected)</li><li>.pfx – PFX, predecessor of PKCS#12 (usually contains data in PKCS#12 format, e.g., with PFX files generated in IIS)</li></ul>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> Security </tag>
            
            <tag> Symmetric vs Asymmetric </tag>
            
            <tag> X509 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Hey, all StackOverflow</title>
      <link href="2017/05/05/readme/"/>
      <url>2017/05/05/readme/</url>
      
        <content type="html"><![CDATA[]]></content>
      
      
      
    </entry>
    
    
    
    <entry>
      <title>Hey, all StackOverflow</title>
      <link href="2017/05/05/markdown/README/"/>
      <url>2017/05/05/markdown/README/</url>
      
        <content type="html"><![CDATA[<h1 id="updates"><a class="markdownIt-Anchor" href="#updates"></a> updates</h1><ul><li>12 May 2018<ul><li>update Hexo , Next Theme ; add DISQUS</li></ul></li></ul><h1 id="how-to-update-the-hexo-theme"><a class="markdownIt-Anchor" href="#how-to-update-the-hexo-theme"></a> How to update the Hexo &amp; Theme</h1><ol><li>update Hexo</li></ol><p>update the package.json</p><ol start="2"><li>clone the Hexo Branch back to local</li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">git clone -b hexo git@github.com:racheliurui/racheliurui.github.io.git</span><br><span class="line"></span><br><span class="line"># for example we want next v5.1.2</span><br><span class="line">mkdir themes/next512</span><br><span class="line">curl -L https://api.github.com/repos/iissnan/hexo-theme-next/tarball/v5.1.2 | tar -zxv -C themes/next512 --strip-components=1</span><br><span class="line"></span><br><span class="line"># update racheliurui.github.io/_config.yml pointing to new theme (folder name)</span><br><span class="line"># backup &amp; update new theme&apos;s _config.yml file (referring to old theme config file)</span><br><span class="line"># Refer to Theme&apos;s github website to do any extra config</span><br><span class="line"># git add -A; git commit &quot;udpates&quot;</span><br><span class="line">git push  origin hexo</span><br></pre></td></tr></table></figure><h2 id="other-themes"><a class="markdownIt-Anchor" href="#other-themes"></a> other themes</h2><p><a href="https://github.com/ptsteadman/hexo-theme-corporate-example" target="_blank" rel="noopener">https://github.com/ptsteadman/hexo-theme-corporate-example</a></p><p><a href="https://www.duyidong.com/2017/03/07/Deploy-Hexo-to-S3/" target="_blank" rel="noopener">https://www.duyidong.com/2017/03/07/Deploy-Hexo-to-S3/</a></p><p><a href="https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html" target="_blank" rel="noopener">https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> Hexo </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Atom Settings</title>
      <link href="2017/05/05/markdown/IDEs/atom/AtomSetting/"/>
      <url>2017/05/05/markdown/IDEs/atom/AtomSetting/</url>
      
        <content type="html"><![CDATA[<h1 id="atom-behind-firewall"><a class="markdownIt-Anchor" href="#atom-behind-firewall"></a> Atom behind firewall</h1><p><a href="https://discuss.atom.io/t/is-there-any-proxy-settings/710/59" target="_blank" rel="noopener">https://discuss.atom.io/t/is-there-any-proxy-settings/710/59</a></p><p>ENVIONRMENT: WIndows 7</p><p>Browse to your .atom directory, for me it was under (C:\Users.atom<br>Create a new file named ‘.apmrc’<br>Open and add in:<br>https-proxy=http://USERNAME:PASSWORD@domain:port</p>]]></content>
      
      
      
        <tags>
            
            <tag> ide </tag>
            
            <tag> atom </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Intellij Settings</title>
      <link href="2017/05/05/markdown/IDEs/intellij/intellijSettings/"/>
      <url>2017/05/05/markdown/IDEs/intellij/intellijSettings/</url>
      
        <content type="html"><![CDATA[<!-- TOC START min:1 max:3 link:true update:true --><ul><li><a href="#normal-setting-after-install">Normal Setting after install</a><ul><li><a href="#encoding">Encoding</a></li><li><a href="#git-ignore">git ignore</a></li></ul></li><li><a href="#import-existing-projects">import existing projects</a><ul><li><a href="#set-folder-structure">set folder structure</a></li></ul></li><li><a href="#issues">issues</a><ul><li><a href="#issue-with-jdk-version-while-building">issue with JDK version while building</a></li></ul></li></ul><!-- TOC END --><h1 id="normal-setting-after-install"><a class="markdownIt-Anchor" href="#normal-setting-after-install"></a> Normal Setting after install</h1><h2 id="encoding"><a class="markdownIt-Anchor" href="#encoding"></a> Encoding</h2><p>Setting -&gt; File Encodings</p><h2 id="git-ignore"><a class="markdownIt-Anchor" href="#git-ignore"></a> git ignore</h2><p>Use below .gitignore</p><p><a href="https://github.com/github/gitignore/blob/master/Global/JetBrains.gitignore" target="_blank" rel="noopener">https://github.com/github/gitignore/blob/master/Global/JetBrains.gitignore</a></p><p><a href="https://github.com/github/gitignore/blob/master/Java.gitignore" target="_blank" rel="noopener">https://github.com/github/gitignore/blob/master/Java.gitignore</a></p><h1 id="import-existing-projects"><a class="markdownIt-Anchor" href="#import-existing-projects"></a> import existing projects</h1><h3 id="set-folder-structure"><a class="markdownIt-Anchor" href="#set-folder-structure"></a> set folder structure</h3><h1 id="issues"><a class="markdownIt-Anchor" href="#issues"></a> issues</h1><h2 id="issue-with-jdk-version-while-building"><a class="markdownIt-Anchor" href="#issue-with-jdk-version-while-building"></a> issue with JDK version while building</h2><ul><li>exception</li></ul> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">diamond operator is not supported in -source 1.5</span><br></pre></td></tr></table></figure><ul><li>resolution</li></ul><p>Change the project pom.xml<br><a href="https://stackoverflow.com/questions/29258141/maven-compilation-error-use-source-7-or-higher-to-enable-diamond-operator/31734791#31734791" target="_blank" rel="noopener">https://stackoverflow.com/questions/29258141/maven-compilation-error-use-source-7-or-higher-to-enable-diamond-operator/31734791#31734791</a></p><p>Two options, both work the same.</p><p>Option 1,</p> <figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">properties</span>&gt;</span></span><br><span class="line">      <span class="tag">&lt;<span class="name">maven.compiler.source</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">maven.compiler.source</span>&gt;</span></span><br><span class="line">      <span class="tag">&lt;<span class="name">maven.compiler.target</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">maven.compiler.target</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">properties</span>&gt;</span></span><br></pre></td></tr></table></figure><p>Option2,</p> <figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">build</span>&gt;</span></span><br><span class="line">       <span class="tag">&lt;<span class="name">plugins</span>&gt;</span></span><br><span class="line">           <span class="tag">&lt;<span class="name">plugin</span>&gt;</span></span><br><span class="line">               <span class="tag">&lt;<span class="name">artifactId</span>&gt;</span>maven-compiler-plugin<span class="tag">&lt;/<span class="name">artifactId</span>&gt;</span></span><br><span class="line">               <span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">                   <span class="tag">&lt;<span class="name">source</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">source</span>&gt;</span></span><br><span class="line">                   <span class="tag">&lt;<span class="name">target</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">target</span>&gt;</span></span><br><span class="line">               <span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br><span class="line">           <span class="tag">&lt;/<span class="name">plugin</span>&gt;</span></span><br><span class="line">       <span class="tag">&lt;/<span class="name">plugins</span>&gt;</span></span><br><span class="line">   <span class="tag">&lt;/<span class="name">build</span>&gt;</span></span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> ide </tag>
            
            <tag> intellj </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Linux commands</title>
      <link href="2017/04/15/markdown/BackToBasic/Linux/CommandsCheetSheet/"/>
      <url>2017/04/15/markdown/BackToBasic/Linux/CommandsCheetSheet/</url>
      
        <content type="html"><![CDATA[<h1 id="for-redhat-linux"><a class="markdownIt-Anchor" href="#for-redhat-linux"></a> for Redhat Linux</h1><ul><li><p>list all the repositories</p></li><li><p>find current folder size<br>findmnt /tmp -o SOURCE,FSTYPE,SIZE,USED,AVAIL,USE%,TARGET</p></li><li><p>centos minimal might not contains netstat</p></li></ul><p>yum install net-tools</p><p><a href="https://cyruslab.net/2014/07/11/installing-netstat-on-centos-7-minimal-installation/" target="_blank" rel="noopener">https://cyruslab.net/2014/07/11/installing-netstat-on-centos-7-minimal-installation/</a></p><h1 id="list-block-device"><a class="markdownIt-Anchor" href="#list-block-device"></a> list block device</h1><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">lsblk</span><br></pre></td></tr></table></figure><p><a href="https://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-lvm.html" target="_blank" rel="noopener">https://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-lvm.html</a></p><p>Type:<br>disk<br>part<br>lvm<br>rom</p><h1 id="ssh-tunnels"><a class="markdownIt-Anchor" href="#ssh-tunnels"></a> SSH Tunnels</h1><p><a href="https://scriptingosx.com/2017/07/ssh-tunnels/" target="_blank" rel="noopener">https://scriptingosx.com/2017/07/ssh-tunnels/</a></p><figure class="highlight plain"><figcaption><span>shell</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ ssh  -N -L localhost:8080:localhost:80 -i ~/.ssh/ec2Test.pem ec2-user@ec2publicip.us-east-2.compute.amazonaws.com</span><br></pre></td></tr></table></figure><p>Setup test env</p>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> linux </tag>
            
            <tag> shell </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Linux commands</title>
      <link href="2017/04/15/markdown/BackToBasic/Security/CommandsCheetSheet/"/>
      <url>2017/04/15/markdown/BackToBasic/Security/CommandsCheetSheet/</url>
      
        <content type="html"><![CDATA[<h1 id="for-redhat-linux"><a class="markdownIt-Anchor" href="#for-redhat-linux"></a> for Redhat Linux</h1><ul><li>list all the repositories</li></ul><h1 id="common-linux"><a class="markdownIt-Anchor" href="#common-linux"></a> Common Linux</h1><ul><li>find current folder size</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">findmnt /tmp -o SOURCE,FSTYPE,SIZE,USED,AVAIL,USE%,TARGET</span><br><span class="line">findmnt /var -o AVAIL</span><br></pre></td></tr></table></figure><ul><li>check current user group</li></ul> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ groups username</span><br><span class="line">$ usermod -a -G groupname username</span><br></pre></td></tr></table></figure><ul><li>check RAM and CPU information</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cat /proc/cpuinfo</span><br></pre></td></tr></table></figure><h2 id="subscription-manager"><a class="markdownIt-Anchor" href="#subscription-manager"></a> subscription-manager</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">subscription-manager repos --disable '*Hortonworks*'</span><br><span class="line">subscription-manager repos --enable Hortonworks_Ambari_2_5_1_0_RHEL7 --enable Hortonworks_Ambari_2_4_2_0_RHEL7 --enable Hortonworks_HDP-UTILS_1_1_0_21_RHEL7</span><br><span class="line"><span class="meta">#</span><span class="bash">check existing package included <span class="keyword">in</span> certain repo</span></span><br><span class="line">yum list available | grep Hortonworks_Ambari_2_5_1_0_RHEL7</span><br></pre></td></tr></table></figure><ul><li>find port 7000 is used by which process</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> fuser 7000/tcp</span></span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> linux </tag>
            
            <tag> shell </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Ansible Samples</title>
      <link href="2017/04/15/markdown/BackToBasic/Linux/Ansible/AnsibleSamples/"/>
      <url>2017/04/15/markdown/BackToBasic/Linux/Ansible/AnsibleSamples/</url>
      
        <content type="html"><![CDATA[<h1 id="ansible-samples"><a class="markdownIt-Anchor" href="#ansible-samples"></a> Ansible Samples</h1><ul><li>System Check Samples</li></ul><p><a href="https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/main.yml" target="_blank" rel="noopener">https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/main.yml</a></p><p><a href="http://docs.ansible.com/ansible/latest/setup_module.html" target="_blank" rel="noopener">http://docs.ansible.com/ansible/latest/setup_module.html</a></p><p><a href="https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/check-requirements.yml" target="_blank" rel="noopener">https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/check-requirements.yml</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> linux </tag>
            
            <tag> automation </tag>
            
            <tag> Ansible </tag>
            
            <tag> yaml </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Make use of the open sourced wrapper</title>
      <link href="2017/04/15/markdown/TechByVendorName/hortonworks/Nifi/RunNifiAsWindowsService/"/>
      <url>2017/04/15/markdown/TechByVendorName/hortonworks/Nifi/RunNifiAsWindowsService/</url>
      
        <content type="html"><![CDATA[<h1 id="make-use-of-the-open-sourced-wrapper"><a class="markdownIt-Anchor" href="#make-use-of-the-open-sourced-wrapper"></a> Make use of the open sourced wrapper</h1><p><a href="https://github.com/kohsuke/winsw/blob/master/doc/installation.md" target="_blank" rel="noopener">https://github.com/kohsuke/winsw/blob/master/doc/installation.md</a></p><h1 id="prepare-nifi-windows-server"><a class="markdownIt-Anchor" href="#prepare-nifi-windows-server"></a> Prepare Nifi windows server</h1><ol><li>Make sure JDK  1.8 above is installed on the windows server and in the class path running the service. If not, change the bootstrap.properties to point to correct JDK.</li></ol><p><strong>At current stage, JRE is not tested</strong></p><ol start="2"><li>Here’s an example of config nifi.xml ,put the file under nifi root folder,</li></ol> <figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">service</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">id</span>&gt;</span>nifi<span class="tag">&lt;/<span class="name">id</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">name</span>&gt;</span>nifi<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">description</span>&gt;</span>this service runs nifi solution<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">env</span> <span class="attr">name</span>=<span class="string">"APP_HOME"</span> <span class="attr">value</span>=<span class="string">"%BASE%"</span>/&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">logpath</span>&gt;</span>%BASE%\logs<span class="tag">&lt;/<span class="name">logpath</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">logmode</span>&gt;</span>rotate<span class="tag">&lt;/<span class="name">logmode</span>&gt;</span>       </span><br><span class="line">     <span class="tag">&lt;<span class="name">executable</span>&gt;</span>%BASE%/jre1.8.0_121/bin/java.exe<span class="tag">&lt;/<span class="name">executable</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">arguments</span>&gt;</span>-cp %BASE%\conf;%BASE%\lib\bootstrap\* -Xms12m -Xmx24m -Dorg.apache.nifi.bootstrap.config.log.dir=%BASE%\logs -Dorg.apache.nifi.bootstrap.config.pid.dir=%BASE%\run       -Dorg.apache.nifi.bootstrap.config.file=%BASE%\conf\bootstrap.conf org.apache.nifi.bootstrap.RunNiFi Start<span class="tag">&lt;/<span class="name">arguments</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">service</span>&gt;</span></span><br></pre></td></tr></table></figure><ol start="3"><li><p>Put the winsw-2.1.0-bin.exe under the folder of nifi root folder, change the name to nifi.exe</p></li><li><p>Use windows shell command line , run nifi.exe install to install the service</p></li></ol> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nifi.exe install</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> hortonworks </tag>
            
            <tag> nifi </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>microservices architecture reading</title>
      <link href="2017/04/12/markdown/BackToBasic/Architecture/Microservices/"/>
      <url>2017/04/12/markdown/BackToBasic/Architecture/Microservices/</url>
      
        <content type="html"><![CDATA[<p><a href="https://www.youtube.com/watch?v=CZ3wIuvmHeM" target="_blank" rel="noopener">https://www.youtube.com/watch?v=CZ3wIuvmHeM</a></p><p>Hystrix –<br>FIT (Fault Injection Testing)<br>Critical Microservices — increase the availability<br>Client Libraries — simplified lib</p><p>CAP Theorem: In the presence of a network partition, you must choose between consistency and availability<br>Netflix’s solution: Eventually Consistency (tech stack: Cassandra)</p><p>Infrastructure<br>multi-region strategy</p><p>Stateless service</p><ul><li>Not a cache or database</li><li>Frequently access metadata</li><li>No instance affinity</li><li>Loss a node is a non-event</li></ul><p>Stateful service</p><ul><li>database &amp; caches</li><li>custom apps hold large amounts of data</li><li>Loss of a node is a notable event</li></ul><p>EVCache</p><ul><li>separate the write to different available zones</li><li>read from local zone</li></ul><p>improve EVCache: separate different requests</p><p>Spinnaker</p><p>Conway’s Law<br>Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.<br>Any piece of software reflects the organizational structure that produced it.</p>]]></content>
      
      
      
        <tags>
            
            <tag> architecutre </tag>
            
            <tag> Microservices </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>What is Kerbose and how it is designed</title>
      <link href="2017/04/05/markdown/BackToBasic/Kerbose/Kerbose/"/>
      <url>2017/04/05/markdown/BackToBasic/Kerbose/Kerbose/</url>
      
        <content type="html"><![CDATA[<!-- TOC START min:1 max:3 link:true update:true --><ul><li><a href="#what-is-kerbose-and-how-it-is-designed">What is Kerbose and how it is designed</a><ul><li><a href="#designing-an-authentication-system-a-dialogue-in-four-scenes">Designing an Authentication System: a Dialogue in Four Scenes</a></li></ul></li><li><a href="#my-own-understanding">my own understanding</a><ul><li><a href="#scene-1">Scene 1</a></li><li><a href="#scene-2">Scene 2</a></li><li><a href="#scene-3">Scene 3,</a></li><li><a href="#scene-4">Scene 4</a></li></ul></li></ul><!-- TOC END --><h1 id="what-is-kerbose-and-how-it-is-designed"><a class="markdownIt-Anchor" href="#what-is-kerbose-and-how-it-is-designed"></a> What is Kerbose and how it is designed</h1><h2 id="designing-an-authentication-system-a-dialogue-in-four-scenes"><a class="markdownIt-Anchor" href="#designing-an-authentication-system-a-dialogue-in-four-scenes"></a> Designing an Authentication System: a Dialogue in Four Scenes</h2><p><a href="https://web.mit.edu/kerberos/dialogue.html" target="_blank" rel="noopener">https://web.mit.edu/kerberos/dialogue.html</a></p><p><a href="http://www.xuebuyuan.com/748249.html" target="_blank" rel="noopener">http://www.xuebuyuan.com/748249.html</a></p><h1 id="my-own-understanding"><a class="markdownIt-Anchor" href="#my-own-understanding"></a> my own understanding</h1><h2 id="scene-1"><a class="markdownIt-Anchor" href="#scene-1"></a> Scene 1</h2><h2 id="scene-2"><a class="markdownIt-Anchor" href="#scene-2"></a> Scene 2</h2><ol><li>step 客户端向认证服务器Charon请求， 并提交用户名密码和请求的服务名</li><li>step 认证服务器Charon检查用户名密码，如果全部正确，则返回一个使用认证服务器的公钥加密的票根， 包含用户名</li><li>step 客户端使用票根请求服务，服务器使用私钥解密票根，获取用户名，通过则向该用户提供服务</li></ol><p><strong>漏洞</strong>： 服务器无法知道解密的票根结果是否正确</p><p><strong>改进</strong>： 票根包含： { 用户名， 服务名 }；由于服务名类似约定内容，这样服务器解密后从内容中得到正确服务名，验证解密正确</p><p><strong>漏洞</strong>： 如果票根包被拦截，其它客户端可以使用拦截的票根发起请求</p><p><strong>改进</strong>：票根包含初始请求客户端ip地址 ，票根包含： { 用户名，请求IP， 服务名 }这样拦截票根的客户端发起请求也无法通过验证（ip对不上）</p><p><strong>新需求</strong>： 每次服务请求都需要反复请求票根太麻烦，即使相同服务可以重用票根，可是不同的服务还是需要重新申请票根，而且不安全 （多次明文提交密码给Charon服务器），可能被人猜出来。</p><h2 id="scene-3"><a class="markdownIt-Anchor" href="#scene-3"></a> Scene 3,</h2><p><strong>试图解决的问题</strong>：</p><ul><li>第一个限制：用户只输一次口令，在他们工作站启动的时候，这意味着当你需要申请新的服务的票时，不需输入你的口令。</li><li>第二个限制：口令不能在网络上进行明文传输。</li></ul><p><strong>步骤</strong></p><ol><li>step 客户端通过kinit来跟认证服务器Charon请求服务。当客户端输入用户名和密码后，kinit只把用户名和请求发给认证服务器Charon</li><li>step 认证服务器Charon将一组票根使用用户的密码进行加密后返回给kinit</li><li>step kinit使用step1中客户输入的密码对认证服务器Charon返回的票根进行解密。成功解密后就可以使用相应票根请求相应服务了。</li></ol><p>解决了上述两根问题。（只输入一次口令，而且口令不需要在网络上明文传输）</p><p><strong>漏洞</strong>： 大量票根存放在本地不安全。 黑客只需要窃取票根，并且模拟用户ip地址就可以访问所有用户的服务。</p><p><strong>改进</strong>：在票根中加入时间戳， 约定过期时间（例如八小时）。记住，票根中的信息是用服务提供者的公钥加密过的。这样服务提供者解密票根后可以知道是谁在什么时间请求了这些服务， 是否过期。</p><p><strong>漏洞</strong>：但是，在时间戳过期前，仍旧是不安全的。</p><h2 id="scene-4"><a class="markdownIt-Anchor" href="#scene-4"></a> Scene 4</h2><p><strong>试图解决的问题</strong>：</p><p>试图解决的问题：Scene3中的遗留问题。即在时间戳过期前，票根被截取，ip地址被模拟。</p><p><strong>步骤</strong></p><ol><li>step 客户端通过kinit来跟认证服务器Charon请求服务。当客户端输入用户名和密码后，kinit只把用户名和请求发给认证服务器Charon</li><li>step 认证服务器Charon将一组票根和一个动态口令使用用户的密码进行加密后返回给kinit<br>Charon的回应－客户端口令加密的（[口令｜票]） ；其中 票= 服务公钥加密的｛口令：用户名：地址：服务名：有效期：时间戳｝</li><li>step kinit使用step1中客户输入的密码对认证服务器Charon返回的返回进行解密, 成功解密后，除了票根，还得到了一个动态口令，这个动态口令跟服务公钥加密中包含的口令相同。</li><li>step 请求服务时，使用动态口令对｛用户名：地址｝进行加密，然后跟票根一起提交服务请求。</li><li>step 服务使用服务私钥解密票根，得到动态口令，使用动态口令解密跟票根一起提交的用户名和地址，如果对得上，则通过。</li></ol><p><strong>漏洞</strong>： 拷贝包含动态口令加密过的票根和模拟ip地址（但是不知道动态口令），仍旧可以在时间戳过期前复用票根。<br>改进：给动态口令加密的内容中加上时间戳，比如两分钟，然后每次用动态口令加密的时候里面加密进去一个时间戳，然后服务端用动态口令解密后检查时间戳是否失效，这样即使截获到包含正确动态口令加密过的数据包仍然是没有足够的时间重演的（需要模拟ip地址重发）。</p><p><strong>如果动态密码本身被截获就不同了。如何解决？</strong></p><p>但是其实是不可能的，因为动态口令在回给kinit的时候使用客户的key进行了加密，所以只有正确的kinit客户端才有用户的密码解开包，看到动态口令。截获包含动态密码的数据包也是没有用的</p><p><strong>漏洞</strong>： 这个场景保护的是服务不被假冒的客户使用。但是并不保证客户访问的是真正的服务器，如果在routing上做手脚，用户可能把要打印的内容发送给假的服务方。如何解决？</p><p><strong>方案</strong>:</p><p>使用上述口令可以解决。<br>当客户端准备发送敏感数据前，可以要求服务端返回使用动态口令加密的相应。由于动态口令只有客户端和正确的服务端知道，所以只有正确的服务端能够使用动态口令对内容进行加密并发送正确的相应给客户端。<br>问题解决。</p><p><strong>这样的一种安全方案就叫做Kerberos</strong></p>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> kerbose </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Vagrant Configuration</title>
      <link href="2017/03/11/markdown/TechByVendorName/Vagrant/VagrantSetup/"/>
      <url>2017/03/11/markdown/TechByVendorName/Vagrant/VagrantSetup/</url>
      
        <content type="html"><![CDATA[<h1 id="install-vagrant"><a class="markdownIt-Anchor" href="#install-vagrant"></a> install Vagrant</h1><h1 id="install-virtual-boxconfig-vagrant-behind-the-proxyhttpwwwnetinstructionscomrunning-vagrant-1-8-behind-a-proxyafter-proxy-being-setted-run-to-add-a-boxvagrant-box-add-hashicorpprecise64vagrant-up-to-start-the-vmexception-of303c768-error-rc-5673303c768-ntallocatevirtualmemory-0000000000400000-lb-0x1000-failed-with-rcnt0xc0000018-allocating-replacement-memory-for-working-around-buggy-protection-software-see-vboxstartuplog-for-more-details303c768-error-rc-5645303c768-too-many-virtual-memory-regions"><a class="markdownIt-Anchor" href="#install-virtual-boxconfig-vagrant-behind-the-proxyhttpwwwnetinstructionscomrunning-vagrant-1-8-behind-a-proxyafter-proxy-being-setted-run-to-add-a-boxvagrant-box-add-hashicorpprecise64vagrant-up-to-start-the-vmexception-of303c768-error-rc-5673303c768-ntallocatevirtualmemory-0000000000400000-lb-0x1000-failed-with-rcnt0xc0000018-allocating-replacement-memory-for-working-around-buggy-protection-software-see-vboxstartuplog-for-more-details303c768-error-rc-5645303c768-too-many-virtual-memory-regions"></a> install virtual box<br>Config vagrant behind the proxy,<br><a href="http://www.netinstructions.com/running-vagrant-1-8-behind-a-proxy/" target="_blank" rel="noopener">http://www.netinstructions.com/running-vagrant-1-8-behind-a-proxy/</a><br>After proxy being setted, run to add a box<br>vagrant box add hashicorp/precise64<br>Vagrant up to start the vm<br>================Exception of,<br>303c.768: Error (rc=-5673):<br>303c.768: NtAllocateVirtualMemory (0000000000400000 LB 0x1000) failed with rcNt=0xc0000018 allocating replacement memory for working around buggy protection software. See VBoxStartup.log for more details<br>303c.768: Error (rc=-5645):<br>303c.768: Too many virtual memory regions.</h1><p>Trying to use other boxes and changed the default location for .vagrant.d to avoid security limitation (by default under current domain user) by,<br>set VAGRANT_HOME=somewhereelse<br>Then, run command to add other boxes (32bit)<br><a href="http://www.vagrantbox.es/" target="_blank" rel="noopener">http://www.vagrantbox.es/</a></p><p>Vagrant</p><p>Workarount the proxy</p><p><a href="https://runefs.com/2014/11/28/setting-up-vagrant-behind-a-corporate-proxy/" target="_blank" rel="noopener">https://runefs.com/2014/11/28/setting-up-vagrant-behind-a-corporate-proxy/</a></p><p>VERR_VMX_NO_VMX</p><p>Vagrant conflict with Hyper-V on windows 10, disable Hyper-V before vagrant up</p>]]></content>
      
      
      
        <tags>
            
            <tag> vagrant </tag>
            
            <tag> vm </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>IIB connect to gmail</title>
      <link href="2017/03/05/markdown/TechByVendorName/IBM/IIBMail/"/>
      <url>2017/03/05/markdown/TechByVendorName/IBM/IIBMail/</url>
      
        <content type="html"><![CDATA[<h1 id="prepare-the-gmail-account-for-pop3"><a class="markdownIt-Anchor" href="#prepare-the-gmail-account-for-pop3"></a> Prepare the Gmail Account for Pop3</h1><ul><li>Gmail Settings :  Enable Pop3</li><li>Enable Less secure apps</li></ul><p><a href="https://myaccount.google.com/lesssecureapps" target="_blank" rel="noopener">https://myaccount.google.com/lesssecureapps</a></p><h1 id="prepare-the-truststore-for-iib"><a class="markdownIt-Anchor" href="#prepare-the-truststore-for-iib"></a> Prepare the TrustStore for IIB</h1><p><a href="https://www.avisi.nl/blog/2012/09/12/quick-way-to-retrieve-a-chain-of-ssl-certificates-from-a-server/" target="_blank" rel="noopener">https://www.avisi.nl/blog/2012/09/12/quick-way-to-retrieve-a-chain-of-ssl-certificates-from-a-server/</a></p><p>To retrieve the certificate from gmail pop server.</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">openssl s_client -host pop.gmail.com -port 995 -prexit -showcerts</span><br></pre></td></tr></table></figure><p>Then save each of the certs as separate .cer or .pem file and import into the truststore.</p><h1 id="set-up-iib-run-time-env"><a class="markdownIt-Anchor" href="#set-up-iib-run-time-env"></a> Set up IIB run time env</h1><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">mqsisetdbparms TESTNODE_Rachel -n gmailtestaccount -u myemailaddress@gmail.com -p mygmailpassword</span><br><span class="line"></span><br><span class="line">mqsisetdbparms TESTNODE_Rachel -n email::gmailtestaccount -u myemailaddress@gmail.com -p mygmailpassword</span><br></pre></td></tr></table></figure><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">mqsichangeproperties TESTNODE_Rachel -e default -o ComIbmJVMManager -n truststoreFile -v "D:\temp\certs\localhost.truststore.jks"</span><br><span class="line">mqsisetdbparms TESTNODE_Rachel -n brokerTruststore::password -u temp -p mytruststorepassword</span><br><span class="line">mqsichangeproperties TESTNODE_Rachel -e default -o ComIbmJVMManager -n truststorePass -v "brokerTruststore::password"</span><br><span class="line">mqsichangeproperties TESTNODE_Rachel -e default -o ComIbmJVMManager -n truststoreType -v JKS</span><br><span class="line">mqsichangeproperties TESTNODE_Rachel -e default -o ComIbmJVMManager -n jvmSystemProperty -v "-Dmail.smtp.auth.enable=true -Dmail.smtp.starttls.enable=true"</span><br></pre></td></tr></table></figure><p>to check the current configuration</p><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">mqsireportproperties TESTNODE_Rachel -o BrokerRegistry -a</span><br><span class="line">mqsireportproperties TESTNODE_Rachel -e default -o ComIbmJVMManager -a</span><br></pre></td></tr></table></figure><h1 id="configure-the-iib-email-input-node"><a class="markdownIt-Anchor" href="#configure-the-iib-email-input-node"></a> Configure the IIB Email Input node</h1><ul><li>Email Server: pop3s://pop.gmail.com:995</li><li>Security Identity: gmailtestaccount</li></ul><h1 id="reference-links"><a class="markdownIt-Anchor" href="#reference-links"></a> Reference Links</h1><p><a href="https://developer.ibm.com/answers/questions/210834/how-configure-email-input-node-to-receive-emails-w.html" target="_blank" rel="noopener">https://developer.ibm.com/answers/questions/210834/how-configure-email-input-node-to-receive-emails-w.html</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> ibm </tag>
            
            <tag> IIB </tag>
            
            <tag> pop3 </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>MongoDB Cluster Setup</title>
      <link href="2017/03/05/markdown/TechByVendorName/MongoDB/MongoDBCluster/"/>
      <url>2017/03/05/markdown/TechByVendorName/MongoDB/MongoDBCluster/</url>
      
        <content type="html"><![CDATA[<p><a href="http://www.alphadevx.com/a/491-Running-two-MongoDB-instances-on-one-server" target="_blank" rel="noopener">http://www.alphadevx.com/a/491-Running-two-MongoDB-instances-on-one-server</a><br>on red hat, the service config is<br>/usr/lib/systemd/system</p><p><a href="https://severalnines.com/blog/turning-mongodb-replica-set-sharded-cluster" target="_blank" rel="noopener">https://severalnines.com/blog/turning-mongodb-replica-set-sharded-cluster</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> mongodb </tag>
            
            <tag> cluster </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Deploy rpm package to Nexus</title>
      <link href="2017/02/11/markdown/Java/Maven/Maven-Nexus/"/>
      <url>2017/02/11/markdown/Java/Maven/Maven-Nexus/</url>
      
        <content type="html"><![CDATA[<h2 id="deploy-rpm-package-to-nexus"><a class="markdownIt-Anchor" href="#deploy-rpm-package-to-nexus"></a> Deploy rpm package to Nexus</h2><h3 id="method-1"><a class="markdownIt-Anchor" href="#method-1"></a> Method 1</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">mvn deploy:deploy-file \</span><br><span class="line">    -DgroupId=com.github.diegopacheco.sandbox.devops \</span><br><span class="line">    -DartifactId=fpmtest \</span><br><span class="line">    -Dversion=1.0.0 \</span><br><span class="line">    -DgeneratePom=true \</span><br><span class="line">    -Dpackaging=rpm \</span><br><span class="line">    -DrepositoryId=nexus \</span><br><span class="line">    -Durl=http://127.0.0.1:8081/nexus/content/repositories/releases \</span><br><span class="line">    -Dfile=slashbin-1.0-1.x86_64.rpm</span><br></pre></td></tr></table></figure><h3 id="method-2"><a class="markdownIt-Anchor" href="#method-2"></a> Method 2</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -v -u admin:admin123 --upload-file slashbin-1.0-1.x86_64.rpm \</span><br><span class="line">http://127.0.0.1:8081/nexus/content/repositories/releases/com/github/diegopacheco/sandbox/devops/fpmtest/1.0.1/fpmtest-1.0.1.rpm</span><br></pre></td></tr></table></figure><h3 id="method-3"><a class="markdownIt-Anchor" href="#method-3"></a> Method 3</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl -v -F r=releases -F hasPom=false -F e=rpm -F g=com.github.diegopacheco.sandbox.devops -F a=fpmtest -F v=2.0 -F p=rpm -F file=@slashbin-1.0-1.x86_64.rpm -u admin:admin123 http://127.0.0.1:8081/nexus/service/local/artifact/maven/content</span><br></pre></td></tr></table></figure><h1 id="reference-links"><a class="markdownIt-Anchor" href="#reference-links"></a> Reference links</h1><p><a href="https://maven.apache.org/guides/mini/guide-encryption.html" target="_blank" rel="noopener">https://maven.apache.org/guides/mini/guide-encryption.html</a></p><p><a href="https://gist.github.com/diegopacheco/e04e90508451e8ce134b" target="_blank" rel="noopener">https://gist.github.com/diegopacheco/e04e90508451e8ce134b</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> java </tag>
            
            <tag> maven </tag>
            
            <tag> nexus </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>notes of setting up mysql for test env</title>
      <link href="2017/02/11/markdown/TechByVendorName/MySQL/mysqlTestEnvSetup/"/>
      <url>2017/02/11/markdown/TechByVendorName/MySQL/mysqlTestEnvSetup/</url>
      
        <content type="html"><![CDATA[<h1 id="on-windows"><a class="markdownIt-Anchor" href="#on-windows"></a> On windows</h1><h2 id="start-and-trigger-mysql-cmd"><a class="markdownIt-Anchor" href="#start-and-trigger-mysql-cmd"></a> start and trigger mysql cmd</h2><ul><li>start default mysql db engine</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mysqld</span><br></pre></td></tr></table></figure><ul><li>connect to mysql engine and get the cmd prompt</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mysql -u root -p -h localhost</span><br></pre></td></tr></table></figure><h2 id="cmd-run-under-mysql-cmd-prompt"><a class="markdownIt-Anchor" href="#cmd-run-under-mysql-cmd-prompt"></a> cmd run under mysql cmd prompt</h2><ul><li>list database</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">show databases;</span><br></pre></td></tr></table></figure><ul><li>create new database</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">create database poc;</span><br></pre></td></tr></table></figure><ul><li>switch database</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">use poc;</span><br></pre></td></tr></table></figure><ul><li>list tables</li></ul> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">show tables;</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> mysql </tag>
            
            <tag> devops </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>notes of setting up Oracle DB for test env</title>
      <link href="2017/02/11/markdown/TechByVendorName/OracleDB/SetupOracleXEAsIsolatedTestEnv/"/>
      <url>2017/02/11/markdown/TechByVendorName/OracleDB/SetupOracleXEAsIsolatedTestEnv/</url>
      
        <content type="html"><![CDATA[<h1 id="for-existing-db-build-isolated-test-from-scratch"><a class="markdownIt-Anchor" href="#for-existing-db-build-isolated-test-from-scratch"></a> For existing DB, build isolated test from scratch</h1><h2 id="install-oracle-xe"><a class="markdownIt-Anchor" href="#install-oracle-xe"></a> install oracle XE</h2><ul><li>connect to system instance</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">connect system/password;</span><br></pre></td></tr></table></figure><p>the password that you entered during the installation.</p><ul><li>create user</li></ul><p>create user username identified by ‘password’;<br>and also to give this user some privileges for creating tables, views and so on . .</p><ul><li>grant access</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">grant dba,resource, connect to username;</span><br><span class="line">GRANT Create Session TO username;</span><br></pre></td></tr></table></figure><h2 id="obtain-the-ddl-of-the-table-needed"><a class="markdownIt-Anchor" href="#obtain-the-ddl-of-the-table-needed"></a> Obtain the DDL of the table needed</h2><p>In dev Oracle DB</p><ol><li><p>Install Oracle XE</p></li><li><p>using sql developer to export existing schema definition</p></li><li><p>in new database,<br>create required table space<br>create tablespace tablespacename datafile ‘datafilename.dbf’ size 40m online;<br>change tablespace’s autoextend pram<br>ALTER DATABASE DATAFILE ‘datafilename.dbf’ AUTOEXTEND ON MAXSIZE UNLIMITED;</p></li><li><p>in SQL developer, connect to database using created user<br>Export existing schema definition using SQL Developer<br>Run in backuped database</p></li><li><p>create same schema by create the user<br>create user <schema-name> identified by <schema-name>;</schema-name></schema-name></p></li></ol><h1 id="issues"><a class="markdownIt-Anchor" href="#issues"></a> Issues</h1><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">ErrorDesc</span>&gt;</span>Child SQL exception ( HY000 2289 [IBM][ODBC Oracle Wire Protocol driver][Oracle]ORA-02289: sequence does not exist )<span class="tag">&lt;/<span class="name">ErrorDesc</span>&gt;</span></span><br></pre></td></tr></table></figure><p><strong>Reason</strong></p><p>forgot to create trigger for the tables.</p><h2 id="set-up-oracle-with-docker-on-mac-os"><a class="markdownIt-Anchor" href="#set-up-oracle-with-docker-on-mac-os"></a> set up Oracle with Docker on Mac OS</h2><p><a href="https://www.esentri.com/blog/2017/05/15/create-and-use-a-docker-container-with-oracle-xe-on-macos/" target="_blank" rel="noopener">https://www.esentri.com/blog/2017/05/15/create-and-use-a-docker-container-with-oracle-xe-on-macos/</a><br><a href="https://github.com/oracle/docker-images/tree/master/OracleDatabase" target="_blank" rel="noopener">https://github.com/oracle/docker-images/tree/master/OracleDatabase</a></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">./buildDockerImage.sh -v 12.2.0.1 -s -i</span><br><span class="line">docker images</span><br><span class="line">docker run  \</span><br><span class="line">-p 1521:1521 -p 5500:5500 \</span><br><span class="line">-e ORACLE_SID=ORCLCDB \</span><br><span class="line">-e ORACLE_PDB=ORCLPDB1 \</span><br><span class="line">-e ORACLE_CHARACTERSET=AL32UTF8 \</span><br><span class="line">-v /Users/ruiliu/data/oracledata:/opt/oracle/oradata \</span><br><span class="line">oracle/database:12.2.0.1-se2</span><br><span class="line"></span><br><span class="line">docker ps -a</span><br><span class="line">docker exec 35d6ef419dba ./setPassword.sh eVG7PQ0DnxcI</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">ORACLE PASSWORD FOR SYS, SYSTEM AND PDBADMIN: eVG7PQ0Dnxc=1</span><br><span class="line"></span><br><span class="line">LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 14-FEB-2018 12:17:23</span><br><span class="line"></span><br><span class="line">Copyright (c) 1991, 2016, Oracle.  All rights reserved.</span><br><span class="line"></span><br><span class="line">Starting /opt/oracle/product/12.2.0.1/dbhome_1/bin/tnslsnr: please wait...</span><br><span class="line"></span><br><span class="line">TNSLSNR for Linux: Version 12.2.0.1.0 - Production</span><br><span class="line">System parameter file is /opt/oracle/product/12.2.0.1/dbhome_1/network/admin/listener.ora</span><br><span class="line">Log messages written to /opt/oracle/diag/tnslsnr/35d6ef419dba/listener/alert/log.xml</span><br><span class="line">Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1)))</span><br><span class="line">Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))</span><br><span class="line"></span><br><span class="line">Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">docker container ps  -a</span><br><span class="line">docker start  35d6ef419dba</span><br><span class="line">docker exec -ti 35d6ef419dba sqlplus pdbadmin@ORCLPDB1</span><br><span class="line"></span><br><span class="line">docker stop 35d6ef419dba</span><br></pre></td></tr></table></figure><h1 id="connect-from-sql-developer"><a class="markdownIt-Anchor" href="#connect-from-sql-developer"></a> connect from SQL Developer</h1><p>user pdbadmin<br>password eVG7PQ0DnxcI<br>port 1521<br>service name ORCLPDB1</p><p>data storage: ~/data/oracledata</p><h1 id="shutdown-the-environment"><a class="markdownIt-Anchor" href="#shutdown-the-environment"></a> shutdown the environment</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">docker stop 35d6ef419dba</span><br><span class="line">docker container ps  -a</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> devops </tag>
            
            <tag> oracle </tag>
            
            <tag> docker </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Javascript loading issue</title>
      <link href="2017/01/05/markdown/JavaScript/LoadingSequence/"/>
      <url>2017/01/05/markdown/JavaScript/LoadingSequence/</url>
      
        <content type="html"><![CDATA[<p>Basic<br><a href="http://javascript.info/tutorial/adding-script-html" target="_blank" rel="noopener">http://javascript.info/tutorial/adding-script-html</a></p><p>Issue reason,<br>$timeout(function(){});</p><p>the default timeout will be zero. So the function will be triggerred without any delay.<br>The function itself is relying on some value to be initialized, so that’s the reason causing the problem. after change it to<br>$timeout(function(){},3000);<br>the issue solved.</p>]]></content>
      
      
      
        <tags>
            
            <tag> javascript </tag>
            
            <tag> troubleshooting </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Setup a spring boot application to run on windows as a service</title>
      <link href="2016/12/11/markdown/Java/Spring/RunSpringBootAsWindowsService/"/>
      <url>2016/12/11/markdown/Java/Spring/RunSpringBootAsWindowsService/</url>
      
        <content type="html"><![CDATA[<h1 id="setup-a-spring-boot-application-to-run-on-windows-as-a-service"><a class="markdownIt-Anchor" href="#setup-a-spring-boot-application-to-run-on-windows-as-a-service"></a> Setup a spring boot application to run on windows as a service</h1><h1 id="reference-links"><a class="markdownIt-Anchor" href="#reference-links"></a> Reference links</h1><p><a href="https://docs.spring.io/spring-boot/docs/current/reference/html/deployment-install.html" target="_blank" rel="noopener">https://docs.spring.io/spring-boot/docs/current/reference/html/deployment-install.html</a></p><p><a href="https://github.com/kohsuke/winsw/blob/master/doc/installation.md" target="_blank" rel="noopener">https://github.com/kohsuke/winsw/blob/master/doc/installation.md</a></p><p><a href="https://github.com/snicoll-scratches/spring-boot-daemon" target="_blank" rel="noopener">https://github.com/snicoll-scratches/spring-boot-daemon</a></p><p>Basically, winsw provide a way to make executable to run as a windows service.</p><p>Tips,</p><ol><li>Check winsw log for the real command being run</li><li>Security (run as certain user from the service)</li></ol><p>A basic example,</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">service</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">id</span>&gt;</span>myapp<span class="tag">&lt;/<span class="name">id</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>myapp<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">description</span>&gt;</span>this service runs myapp solution<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">env</span> <span class="attr">name</span>=<span class="string">"APP_HOME"</span> <span class="attr">value</span>=<span class="string">"%BASE%"</span>/&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">logpath</span>&gt;</span>%BASE%\logs<span class="tag">&lt;/<span class="name">logpath</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">logmode</span>&gt;</span>rotate<span class="tag">&lt;/<span class="name">logmode</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">executable</span>&gt;</span>%BASE%/jre1.8.0_121/bin/java.exe<span class="tag">&lt;/<span class="name">executable</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">arguments</span>&gt;</span>-Xmx256m -jar %BASE%\myapp.jar -Dlog4j.configuration=%BASE%\config\log4j2.xml --loader.path=%BASE%\config --spring.config.location=%BASE%\config\myapp.properties<span class="tag">&lt;/<span class="name">arguments</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">service</span>&gt;</span></span><br></pre></td></tr></table></figure><h1 id="externalize-spring-config-file"><a class="markdownIt-Anchor" href="#externalize-spring-config-file"></a> externalize spring config file</h1><p><a href="https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html" target="_blank" rel="noopener">https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html</a><br>for example,</p><p>–spring.config.location=/opt/myconfig.properties</p>]]></content>
      
      
      
        <tags>
            
            <tag> java </tag>
            
            <tag> spring </tag>
            
            <tag> springboot </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>CORS issue with Mulesoft</title>
      <link href="2016/11/11/markdown/TechByVendorName/Mulesoft/Mule-CORS-Issue/"/>
      <url>2016/11/11/markdown/TechByVendorName/Mulesoft/Mule-CORS-Issue/</url>
      
        <content type="html"><![CDATA[<h1 id="cors-issue-when-using-mule-to-serve-micro-services-to-frontend"><a class="markdownIt-Anchor" href="#cors-issue-when-using-mule-to-serve-micro-services-to-frontend"></a> CORS issue when using mule to serve micro services to frontend</h1><p>When Angular call Restful API by Mule,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">( cross domain HTTP request )</span><br><span class="line">XMLHttpRequest cannot load http://localhost:8081/iserver/flows. No &apos;Access-Control-Allow-Origin&apos; header is present on the requested resource. Origin &apos;http://xxxxxxx:8080&apos; is therefore not allowed access.</span><br></pre></td></tr></table></figure><p>Related document</p><p><a href="http://blogs.mulesoft.com/dev/anypoint-platform-dev/cross-domain-rest-calls-using-cors/" target="_blank" rel="noopener">http://blogs.mulesoft.com/dev/anypoint-platform-dev/cross-domain-rest-calls-using-cors/</a></p><h2 id="resolution"><a class="markdownIt-Anchor" href="#resolution"></a> resolution</h2><p>add response header</p><p>name: Access-Control-Allow-Origin</p><p>value: Put in the allowed client’s http(s)😕/hostname:port</p>]]></content>
      
      
      
        <tags>
            
            <tag> mulesoft </tag>
            
            <tag> cors </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Git cmmand cheet sheet</title>
      <link href="2016/05/05/markdown/BackToBasic/Git/GitCmd/"/>
      <url>2016/05/05/markdown/BackToBasic/Git/GitCmd/</url>
      
        <content type="html"><![CDATA[<p>S1, have existing repo and change to use git repo to track the change</p><p>with remot repo<br>$ mkdir /path/to/new_repo<br>$ cd /path/to/new_repo<br>$ git --bare init</p><p>with existing repo<br>$ cd /path/to/existing_repo<br>$ git init<br>$ vi .gitignroe<br>$ git add .<br>$ git commit -m “init”</p><p>$ git push --set-upstream /path/to/new_repo mastergit</p><p><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>g</mi><mi>i</mi><mi>t</mi><mi>r</mi><mi>e</mi><mi>m</mi><mi>o</mi><mi>t</mi><mi>e</mi><mi>a</mi><mi>d</mi><mi>d</mi><mi>o</mi><mi>r</mi><mi>i</mi><mi>g</mi><mi>i</mi><mi>n</mi><mi mathvariant="normal">/</mi><mi>p</mi><mi>a</mi><mi>t</mi><mi>h</mi><mi mathvariant="normal">/</mi><mi>t</mi><mi>o</mi><mi>c</mi><mi>u</mi><mi>r</mi><mi>r</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo>−</mo><mi>e</mi><mi>x</mi><mi>i</mi><mi>s</mi><mi>t</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>l</mi><mi>i</mi><mi>b</mi><mi>f</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi>l</mi><mo>:</mo><mi>r</mi><mi>e</mi><mi>m</mi><mi>o</mi><mi>t</mi><mi>e</mi><mi>o</mi><mi>r</mi><mi>i</mi><mi>g</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>d</mi><mi>y</mi><mi>e</mi><mi>x</mi><mi>i</mi><mi>s</mi><mi>t</mi><mi>s</mi><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">git remote add origin /path/tocurrent-existinglibfatal: remote origin already exists.</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault">i</span><span class="mord mathdefault">t</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault">m</span><span class="mord mathdefault">o</span><span class="mord mathdefault">t</span><span class="mord mathdefault">e</span><span class="mord mathdefault">a</span><span class="mord mathdefault">d</span><span class="mord mathdefault">d</span><span class="mord mathdefault">o</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">i</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault">i</span><span class="mord mathdefault">n</span><span class="mord">/</span><span class="mord mathdefault">p</span><span class="mord mathdefault">a</span><span class="mord mathdefault">t</span><span class="mord mathdefault">h</span><span class="mord">/</span><span class="mord mathdefault">t</span><span class="mord mathdefault">o</span><span class="mord mathdefault">c</span><span class="mord mathdefault">u</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault">n</span><span class="mord mathdefault">t</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord mathdefault">e</span><span class="mord mathdefault">x</span><span class="mord mathdefault">i</span><span class="mord mathdefault">s</span><span class="mord mathdefault">t</span><span class="mord mathdefault">i</span><span class="mord mathdefault">n</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault">i</span><span class="mord mathdefault">b</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mord mathdefault">a</span><span class="mord mathdefault">t</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">:</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault">m</span><span class="mord mathdefault">o</span><span class="mord mathdefault">t</span><span class="mord mathdefault">e</span><span class="mord mathdefault">o</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">i</span><span class="mord mathdefault" style="margin-right:0.03588em;">g</span><span class="mord mathdefault">i</span><span class="mord mathdefault">n</span><span class="mord mathdefault">a</span><span class="mord mathdefault" style="margin-right:0.01968em;">l</span><span class="mord mathdefault" style="margin-right:0.02778em;">r</span><span class="mord mathdefault">e</span><span class="mord mathdefault">a</span><span class="mord mathdefault">d</span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mord mathdefault">e</span><span class="mord mathdefault">x</span><span class="mord mathdefault">i</span><span class="mord mathdefault">s</span><span class="mord mathdefault">t</span><span class="mord mathdefault">s</span><span class="mord">.</span></span></span></span> git remote rm origin<br>and then add again</p><p>Create Developer branch<br>git branch develop<br>git checkout develop</p><p>follow the flow the develop<br><a href="https://gist.github.com/yesmeck/4245406" target="_blank" rel="noopener">https://gist.github.com/yesmeck/4245406</a></p><p>======cancel any change in current branch that not yet being staged (add .) ======<br>git checkout – .</p><p>S2, create a new bare repo remotely and then start from local</p><p>git clone /path/to/remote/bare/repo</p><p>====<br>check remote repository<br>git ls-remote</p><p>print all branchs<br>git show-branch -a</p><p>Common working process<br>$ git checkout -b feature/xxx develop</p><h1 id="写代码提交写代码提交"><a class="markdownIt-Anchor" href="#写代码提交写代码提交"></a> 写代码，提交，写代码，提交。。。</h1><h1 id="feature-开发完成合并回-develop"><a class="markdownIt-Anchor" href="#feature-开发完成合并回-develop"></a> feature 开发完成，合并回 develop</h1><p>$ git checkout develop</p><h1 id="务必加上-no-ff以保持分支的合并历史"><a class="markdownIt-Anchor" href="#务必加上-no-ff以保持分支的合并历史"></a> 务必加上 --no-ff，以保持分支的合并历史</h1><p>$ git merge --no-ff feature/xxx<br>$ git branch -d feature/xxx</p>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> git </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Design Pattern - Factory</title>
      <link href="2016/01/05/markdown/Java/DesignPattern/Factory/"/>
      <url>2016/01/05/markdown/Java/DesignPattern/Factory/</url>
      
        <content type="html"><![CDATA[<h1 id="factory"><a class="markdownIt-Anchor" href="#factory"></a> Factory</h1><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">&#123;  </span><br><span class="line">        <span class="comment">//real case invoking</span></span><br><span class="line">        invokeSharedCode(<span class="keyword">new</span> BikeFactory());</span><br><span class="line">        invokeSharedCode(<span class="keyword">new</span> CarFactory());</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">//object being created depending on factory type, and behavior is defined in interface also</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">invokeSharedCode</span><span class="params">(TransportFactory factory)</span></span>&#123;</span><br><span class="line">     Transport transport = factory.create();</span><br><span class="line">       System.out.println(transport.drive());</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="core-concept"><a class="markdownIt-Anchor" href="#core-concept"></a> core concept</h2><p>Shared code interface not hardcoded with typeA or typeB,<br>but typeA and B all extend the same factory and initialization using same function.</p><p>After create, we can invoke the same name function provided by typeA or B.</p>]]></content>
      
      
      
        <tags>
            
            <tag> basic </tag>
            
            <tag> java </tag>
            
            <tag> designpattern </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Angular Related</title>
      <link href="2015/11/11/markdown/JavaScript/AngularJS/Angular_Related/"/>
      <url>2015/11/11/markdown/JavaScript/AngularJS/Angular_Related/</url>
      
        <content type="html"><![CDATA[<h2 id="env-setup"><a class="markdownIt-Anchor" href="#env-setup"></a> Env Setup</h2><p><a href="https://www.npmjs.com/package/http-server" target="_blank" rel="noopener">https://www.npmjs.com/package/http-server</a></p><p>if you’re running node.js http-server is super easy. Install: npm install -g http-server.<br>After installation cd into your project folder and run http-server -o. -o is to open browser to the page.<br>Sample cmd,<br>pathtonode/node pathtohttpserver/http-server [pathtoangularproject] -p port -a address</p><h2 id="compare-strings"><a class="markdownIt-Anchor" href="#compare-strings"></a> Compare Strings</h2><p><a href="https://docs.angularjs.org/api/ng/function/angular.equals" target="_blank" rel="noopener">https://docs.angularjs.org/api/ng/function/angular.equals</a></p><h2 id="directory-structure-best-practice"><a class="markdownIt-Anchor" href="#directory-structure-best-practice"></a> Directory structure best practice</h2><p>Refer to the offical recommendation<br><a href="https://github.com/angular/angular-seed/tree/master/app" target="_blank" rel="noopener">https://github.com/angular/angular-seed/tree/master/app</a></p><h2 id="to-debug-angular-using-chrome"><a class="markdownIt-Anchor" href="#to-debug-angular-using-chrome"></a> To debug Angular using Chrome</h2><p>use AngularJS Batarang.</p><h2 id="warning-tried-to-load-angular-more-than-once"><a class="markdownIt-Anchor" href="#warning-tried-to-load-angular-more-than-once"></a> WARNING: Tried to load angular more than once.</h2><p><a href="http://stackoverflow.com/questions/22595878/tried-to-load-angular-more-than-once" target="_blank" rel="noopener">http://stackoverflow.com/questions/22595878/tried-to-load-angular-more-than-once</a><br>TODO: update the page routing</p><h2 id="scope-not-scope"><a class="markdownIt-Anchor" href="#scope-not-scope"></a> $scope not $Scope</h2><p>Exception, Unknown provider: $ScopeProvider &lt;- $Scope</p><h2 id="angular-invoke-restful-service-that-return-purely-string-array"><a class="markdownIt-Anchor" href="#angular-invoke-restful-service-that-return-purely-string-array"></a> Angular invoke Restful service that return purely string array</h2><p><a href="https://mariuszprzydatek.com/2013/12/13/tricky-behavior-of-angularjs-resource-service/" target="_blank" rel="noopener">https://mariuszprzydatek.com/2013/12/13/tricky-behavior-of-angularjs-resource-service/</a></p><h2 id="prod"><a class="markdownIt-Anchor" href="#prod"></a> PROD</h2><p><a href="http://brandonbohling.com/technology/ng-nginx-pm2/" target="_blank" rel="noopener">http://brandonbohling.com/technology/ng-nginx-pm2/</a><br><a href="https://www.digitalocean.com/community/tutorials/how-to-use-pm2-to-setup-a-node-js-production-environment-on-an-ubuntu-vps" target="_blank" rel="noopener">https://www.digitalocean.com/community/tutorials/how-to-use-pm2-to-setup-a-node-js-production-environment-on-an-ubuntu-vps</a></p>]]></content>
      
      
      
        <tags>
            
            <tag> javascript </tag>
            
            <tag> angular </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Mule flow single threading</title>
      <link href="2015/11/11/markdown/TechByVendorName/Mulesoft/MuleSingleThreading/"/>
      <url>2015/11/11/markdown/TechByVendorName/Mulesoft/MuleSingleThreading/</url>
      
        <content type="html"><![CDATA[<h1 id="sample-flow-implement-single-threading"><a class="markdownIt-Anchor" href="#sample-flow-implement-single-threading"></a> Sample flow implement single threading</h1><p>flow client raise a request and will get response instantly.<br>The request will be processed asycronized (in sequence, 1 thread)</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</span><br><span class="line"></span><br><span class="line">&lt;mule xmlns:vm=&quot;http://www.mulesoft.org/schema/mule/vm&quot; xmlns:http=&quot;http://www.mulesoft.org/schema/mule/http&quot; xmlns=&quot;http://www.mulesoft.org/schema/mule/core&quot; xmlns:doc=&quot;http://www.mulesoft.org/schema/mule/documentation&quot;</span><br><span class="line">xmlns:spring=&quot;http://www.springframework.org/schema/beans&quot;</span><br><span class="line">xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;</span><br><span class="line">xsi:schemaLocation=&quot;http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd</span><br><span class="line">http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd</span><br><span class="line">http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd</span><br><span class="line">http://www.mulesoft.org/schema/mule/vm http://www.mulesoft.org/schema/mule/vm/current/mule-vm.xsd&quot;&gt;</span><br><span class="line">    &lt;http:listener-config name=&quot;HTTP_Listener_Configuration&quot; host=&quot;0.0.0.0&quot; port=&quot;8081&quot; doc:name=&quot;HTTP Listener Configuration&quot;/&gt;</span><br><span class="line"></span><br><span class="line">    &lt;queued-asynchronous-processing-strategy name=&quot;allow1Threads&quot; maxThreads=&quot;1&quot; doc:name=&quot;Queued Asynchronous Processing Strategy&quot;/&gt;</span><br><span class="line">    &lt;vm:connector name=&quot;VM&quot; validateConnections=&quot;true&quot; doc:name=&quot;VM&quot;&gt;</span><br><span class="line">        &lt;vm:queue-profile&gt;</span><br><span class="line">            &lt;default-persistent-queue-store/&gt;</span><br><span class="line">        &lt;/vm:queue-profile&gt;</span><br><span class="line">    &lt;/vm:connector&gt;</span><br><span class="line">    &lt;vm:connector name=&quot;VM1&quot; validateConnections=&quot;true&quot; doc:name=&quot;VM&quot;&gt;</span><br><span class="line">        &lt;vm:queue-profile&gt;</span><br><span class="line">            &lt;default-persistent-queue-store/&gt;</span><br><span class="line">        &lt;/vm:queue-profile&gt;</span><br><span class="line">    &lt;/vm:connector&gt;</span><br><span class="line">    &lt;flow name=&quot;vmtransactionFlow&quot;&gt;</span><br><span class="line">        &lt;http:listener config-ref=&quot;HTTP_Listener_Configuration&quot; path=&quot;/registerTask&quot; doc:name=&quot;HTTP&quot;/&gt;</span><br><span class="line">        &lt;set-payload value=&quot;#[java.util.UUID.randomUUID().toString()]&quot; doc:name=&quot;Set Payload&quot;/&gt;</span><br><span class="line">        &lt;vm:outbound-endpoint exchange-pattern=&quot;one-way&quot; path=&quot;TaskQ&quot; connector-ref=&quot;VM&quot; doc:name=&quot;VM&quot;/&gt;</span><br><span class="line">        &lt;logger message=&quot;Task Received, #[payload]&quot; level=&quot;INFO&quot; doc:name=&quot;Logger&quot;/&gt;</span><br><span class="line">    &lt;/flow&gt;</span><br><span class="line">    &lt;flow name=&quot;vmtransactionFlow1&quot; processingStrategy=&quot;allow1Threads&quot;&gt;</span><br><span class="line">        &lt;vm:inbound-endpoint exchange-pattern=&quot;one-way&quot; path=&quot;TaskQ&quot; connector-ref=&quot;VM&quot; doc:name=&quot;VM&quot;&gt;</span><br><span class="line">        &lt;/vm:inbound-endpoint&gt;</span><br><span class="line">        &lt;logger message=&quot;Started Processing #[payload]&quot; level=&quot;INFO&quot; doc:name=&quot;Logger&quot;/&gt;</span><br><span class="line">        &lt;component class=&quot;vmtransaction.demo.ProcessingTask&quot; doc:name=&quot;Java&quot;/&gt;</span><br><span class="line">        &lt;logger message=&quot;finished the sync vm invoke #[payload]&quot; level=&quot;INFO&quot; doc:name=&quot;Logger&quot;/&gt;</span><br><span class="line">    &lt;/flow&gt;</span><br><span class="line">&lt;/mule&gt;</span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> mulesoft </tag>
            
            <tag> threading </tag>
            
        </tags>
      
    </entry>
    
    
    
    <entry>
      <title>Mule flow with groovy script</title>
      <link href="2013/11/11/markdown/TechByVendorName/Mulesoft/MuleFlowWithGroovy/"/>
      <url>2013/11/11/markdown/TechByVendorName/Mulesoft/MuleFlowWithGroovy/</url>
      
        <content type="html"><![CDATA[<figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">flow</span> <span class="attr">name</span>=<span class="string">"get:/setVariable:api-config"</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">set-variable</span> <span class="attr">variableName</span>=<span class="string">"username"</span> <span class="attr">value</span>=<span class="string">""</span> <span class="attr">doc:name</span>=<span class="string">"username"</span>/&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">scripting:component</span> <span class="attr">doc:name</span>=<span class="string">"Groovy"</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">scripting:script</span> <span class="attr">engine</span>=<span class="string">"Groovy"</span>&gt;</span></span><br><span class="line">&lt;![CDATA[</span><br><span class="line">    String value = 'felipe'</span><br><span class="line">    message.setInvocationProperty('username', value)</span><br><span class="line">]]&gt;<span class="tag">&lt;/<span class="name">scripting:script</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">scripting:component</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">flow</span>&gt;</span></span><br></pre></td></tr></table></figure><figure class="highlight groovy"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//More options of set Value inside Groovy</span></span><br><span class="line">message.setInvocationProperty(<span class="string">'myFlowVariable'</span>, <span class="string">'value'</span>) <span class="comment">// sets a flow variable, like &lt;set-variable/&gt;</span></span><br><span class="line">message.setOutboundProperty(<span class="string">'myProperty'</span>, <span class="string">'value'</span>) <span class="comment">// sets an outbound message property, like &lt;set-property/&gt;</span></span><br><span class="line">message.setProperty(<span class="string">'myInboundProperty'</span>, <span class="string">'value'</span>, PropertyScope.INBOUND) <span class="comment">// sets an inbound property</span></span><br></pre></td></tr></table></figure>]]></content>
      
      
      
        <tags>
            
            <tag> mulesoft </tag>
            
            <tag> groovy </tag>
            
        </tags>
      
    </entry>
    
    
  
  
</search>