From ebe7a1baa779f2b930d3585a88bc94ef5f83613b Mon Sep 17 00:00:00 2001
From: Hayoung Lim <59460178+hihahayoung@users.noreply.github.com>
Date: Thu, 16 May 2024 16:52:55 +0900
Subject: [PATCH] Update README.md (#86)
* Update README.md
* chore: minor modifications
---
README.md | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 113 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 95fa691..56498e0 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,117 @@
-# ticketing-infra
-
+## Introduction
+Our primary goal was to build a scalable and efficient backend for a ticketing service that could handle high loads with minimal latency. The project focuses exclusively on the backend and infrastructure aspects, omitting a frontend interface to concentrate on the underlying mechanics and performance. This repository highlights the infrastructure conponents, illustrating our journey through building infrastructure, creating CI/CD pipelines, and managing containers.
+
+Programming Language & Framework: We chose Kotlin and Spring Boot for their expressive syntax and the powerful suite of tools for building web applications efficiently.
+
+Database: We utilized MySQL for its powerful database locking functions, which are essential in managing concurrent operations effectively within our ticketing system. This capability ensures data integrity and consistent performance under high-load scenarios.
+
+Containerization & Orchestration: Kubernetes was used to manage our containerized applications, enabling easy scaling and management across multiple servers hosted on AWS.
+
+Configuration Management: Helm charts helped us streamline the installation and management of our Kubernetes applications.
+
+Continuous Deployment: ArgoCD was employed to automate the deployment process, ensuring our changes were seamlessly and reliably pushed to production.
+
+Infrastructure as Code: Terraform allowed us to define our infrastructure using configuration files, which helped in maintaining consistency and ease of deployment across environments.
+
+Performance Testing: We employed K6 to conduct spike tests, simulating scenarios with excessive simultaneous access to evaluate the performance and robustness of our system under extreme conditions.
+
+Monitoring: We integrated Prometheus and Grafana to monitor our applications and infrastructure, ensuring high availability and performance through real-time insights.
+
+
+## Additional Repository
+
+- [Backend](https://github.com/f-lab-clone/ticketing-backend)
+
+
+## Infrastructure
+
+![image](https://github.com/f-lab-clone/ticketing-backend/assets/41976906/354a8f92-852c-4cd1-8713-de05e8aa83f0)
+
+In the course of developing our infrastructure, we tackled a range of infrastructure challenges and optimizations. Below are key resources and discussions that provide insights into our decision-making process and the solutions we implemented:
+
+#### Deployment and Configuration with Terraform
+
+- **[Building the Deployment Environment with Terraform](https://github.com/f-lab-clone/ticketing-infra/issues/1)**: This issue tracks our use of Terraform to automate the provisioning of our entire cloud environment, focusing on reliability and scalability.
+
+#### Cost Optimization Strategies
+
+- **[Migration from AWS ALB to Nginx Ingress (Baremetal)](https://github.com/f-lab-clone/ticketing-infra/issues/42)**: To reduce costs, we replaced AWS ALB with a more cost-effective Nginx Ingress setup on bare metal. This discussion details the reasons behind the change and the implementation process.
+- **[Using Public Subnet Node Group to Address NAT Gateway Cost Issues](https://github.com/f-lab-clone/ticketing-infra/issues/61#issuecomment-1748931936)**: We opted to configure our EKS cluster using a public subnet node group to avoid high costs associated with NAT gateways.
+
+#### Monitoring and Metrics
+
+- **[How to Scrape Metrics from Multiple Pods Using Spring Actuator](https://medium.com/@hayounglim/prometheus-helm-how-to-scrape-metrics-from-multiple-pods-using-spring-actuator-08fccd0cf69e)**: This article explains how we set up Prometheus, via Helm, to scrape metrics from multiple pods, enhancing our monitoring capabilities using Spring Boot’s Actuator.
+
+#### Security Enhancements
+
+- **[Injecting Secrets into EKS Pods Using Terraform](https://devkly.com/devops/terraform-secret-manager/)**: We explored methods to securely inject secrets into our Kubernetes pods using Terraform, ensuring sensitive data is managed safely and effectively.
+
+
+## Backend
+
+![Queue System Architecture](https://github.com/f-lab-clone/ticketing-backend/assets/41976906/37d47dc4-c795-437e-afb8-c13957f2c3b6)
+
+- **[Queue System Design Issues](https://github.com/f-lab-clone/ticketing-backend/issues/72#issuecomment-1763249911)**: Discusses considerations for preventing update losses, implementing non-blocking APIs, and choosing data structures for the queue system.
+
+#### Project Package Structure and Conventions
+
+- **[Project Package Structure Considerations](https://github.com/f-lab-clone/ticketing-backend/wiki/%ED%94%84%EB%A1%9C%EC%A0%9D%ED%8A%B8-%ED%8C%A8%ED%82%A4%EC%A7%80-%EA%B5%AC%EC%A1%B0)**: Deliberations on how to organize the project's package structure effectively.
+- **[Convention Documentation](https://github.com/f-lab-clone/ticketing-backend/wiki/Convention)**: Defines conventions for branch naming, commit messages, HTTP response structures, serialization, testing, and more.
+
+#### API Optimization and Testing
+
+- **[API Enhancement Considerations](https://github.com/f-lab-clone/ticketing-backend/issues/52)**: Detailed discussion on time conventions, data transfer between layers (errors, responses), logging best practices, and their implementation.
+- **[Maintaining Over 80% Test Coverage with Jacoco and Codecov](https://github.com/f-lab-clone/ticketing-backend/issues/5)**: Outlines strategies and efforts to maintain a high level of test coverage using Jacoco and Codecov.
+- **[Integration Testing Environment with Testcontainers and MySQL Container](https://github.com/f-lab-clone/ticketing-backend/issues/31)**: Describes the setup of an integrated testing environment using Testcontainers and a MySQL Docker container to enhance testing reliability and consistency.
+
+## CD Pipeline
+
+
+
+
+## Performance Test
+
+![Performance Test Result](https://github.com/f-lab-clone/ticketing-backend/assets/41976906/5cc5b165-fdde-4b67-968f-b94dfc037cfd)
+
+- **[Considerations for Building the Performance Test Environment](https://github.com/f-lab-clone/ticketing-infra/issues/32)**: A detailed discussion on the setup and challenges of creating a suitable environment for performance testing.
+- **[Detailed Performance Test Scenarios](https://github.com/f-lab-clone/ticketing-infra/wiki/Desired-State%EB%A5%BC-%EC%A4%91%EC%8B%AC%EC%9C%BC%EB%A1%9C-%EC%82%B4%ED%8E%B4%EB%B3%B4%EB%8A%94-%EC%9D%B8%ED%94%84%EB%9D%BC-%ED%99%98%EA%B2%BD#%EC%96%B4%EB%96%BB%EA%B2%8C-%EC%84%B1%EB%8A%A5-%ED%85%8C%EC%8A%A4%ED%8A%B8-%ED%99%98%EA%B2%BD%EC%9D%84-%EA%B5%AC%EC%84%B1%ED%96%88%EB%8A%94%EA%B0%80)**: This link provides a thorough description of the performance test scenarios used in our project.
+
+#### Cost Management and Test Scripts
+
+- **[Calculating Costs for Spike Testing Using ALB LCU](https://github.com/f-lab-clone/ticketing-infra/issues/62)**: An analysis of cost implications when using AWS ALB Load Capacity Units (LCU) for spike testing.
+- **[Creating K6 Performance Test Scripts](https://github.com/f-lab-clone/ticketing-backend/issues/83)**: Discussion and documentation on how we developed K6 scripts for our performance testing.
+
+#### Database and Monitoring Setup
+
+- **[Database Setup for Test Data and Large-Scale Data Insertions](https://github.com/f-lab-clone/ticketing-backend/issues/101)**: Outlines our approach to preparing the database for testing, including the creation of large datasets.
+- **[Building a Monitoring Environment with Prometheus and Grafana](https://github.com/f-lab-clone/ticketing-infra/issues/30)**: Details on how we configured Prometheus and Grafana to monitor our application and infrastructure during the performance tests.
+
+
+## Performance Test Report
+
+#### [SignIn Spike Test Report](https://github.com/f-lab-clone/ticketing-backend/issues/105)
+- Improved Slow Queries by adding an index to a single-column with 1 million records.
+- Observed changes in CPU performance due to encryption: increased CPU core count and observed changes in encryption difficulty based on [encryption level adjustments](https://github.com/f-lab-clone/ticketing-backend/issues/107).
+
+#### [JVM Warm Up Test Report](https://github.com/f-lab-clone/ticketing-backend/issues/108)
+- Observed changes in JVM CodeHeap and performance by repeating the same test after process creation.
+
+#### [3000 Requests Per Minute Spike Test Report](https://github.com/f-lab-clone/ticketing-backend/issues/135)
+- Improved performance of `SELECT COUNT(*)` on ten million records by implementing [NoOffset](https://github.com/f-lab-clone/ticketing-backend/issues/113).
+- Introduced a queue system after considering competition for locks on a single resource (=Event) and observed tests.
+
+#### [6000 Requests Per Minute Spike Test Report](https://github.com/f-lab-clone/ticketing-backend/issues/144)
+- Improved CPU resource usage by modifying thread pool strategy for thread creation.
+- Improved Pending Connection by modifying DB Connection Pool strategy.
+- Improved latency by applying Redis caching.
+
+
+## Contributors
-https://github.com/f-lab-clone/ticketing-backend
+| Junha Ahn | Hayoung Lim | Jeongseop Park | Minjun Kim |
+| :----: | :----: | :----: |:----: |
+| [@junha-ahn](https://github.com/junha-ahn) | [@hihahayoung](https://github.com/hihahayoung) | [@ParkJeongseop](https://github.com/ParkJeongseop) | [@minjun3021](https://github.com/minjun3021) |
+|Infrastructure (Leader) |Infrastructure |Backend |Backend |