diff --git a/.deco/blocks/Preview%20blog%2Floaders%2FBlogpostList.ts.json b/.deco/blocks/Preview%20blog%2Floaders%2FBlogpostList.ts.json new file mode 100644 index 00000000..8c0ff880 --- /dev/null +++ b/.deco/blocks/Preview%20blog%2Floaders%2FBlogpostList.ts.json @@ -0,0 +1,3 @@ +{ + "__resolveType": "blog/loaders/BlogpostList.ts" +} \ No newline at end of file diff --git a/.deco/blocks/blogposts.json b/.deco/blocks/blogposts.json index 5d6817f4..9fe2255e 100644 --- a/.deco/blocks/blogposts.json +++ b/.deco/blocks/blogposts.json @@ -1 +1,895 @@ -{"list":{"posts":[{"body":{"en":{"title":"🚀 A New Era of Bots and Crawlers","descr":"How They’re Draining Your Website’s Resources","content":"![Bots and Crawlers](https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/f3f37747-4e4f-4e00-bd5a-bf54bff0a3ec)\n\nYour website’s traffic doesn’t just come from human visitors; bots play a significant role too. **Search engines, social media platforms, and even AI systems deploy automated tools (robots, or 'bots') to crawl your site, extracting content and valuable information.** To rank well on Google, for example, your content must be well-structured, with clear titles, readable text, and highly relevant information. This entire analysis is conducted by bots crawling your site!\n\nBut here’s the catch: **Every time a bot crawls your site, it’s not “free.”** Each request made by a bot consumes resources—whether it's computational power or bandwidth. Most major bots respect a special file (`robots.txt`) that tells them which parts of your site they can or cannot access. As a site owner, you can control which bots are allowed to crawl your site.\n\n```\nUser-agent: *\nAllow: /\n```\n\n_A simple rule that allows all bots to access all pages._\n\nLet’s look at the impact this can have. \n\nIn May, across various Deco sites, bots were responsible for **over 50% of the bandwidth consumed**, even though they didn’t make up the majority of the requests.\n\n![Bots Bandwidth Consumption](https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/54a19e86-09b8-420d-a392-67ef33a4af93)\n\nDespite accounting for less than 20% of traffic, **bots often consume significantly more bandwidth due to the specific pages they access.** They tend to spend more time on larger pages, such as category pages in online stores, which naturally load more data. These pages often feature product variations and filters, making them even more data-heavy.\n\nWhile Google’s bot respects the `nofollow` attribute, which prevents links from being crawled, not all bots do. \nThis means that pages with filter variations also need a `noindex` meta tag or a more specialized `robots.txt` configuration.\n\n## AI: The New Gold Rush\n\nAI is changing the game when it comes to data extraction, release, and value. \n\nThe demand for massive amounts of data for processing has led to the creation of more bots, particularly crawlers operating on the web. **Data is more valuable than ever, yet there’s no guarantee of immediate returns for those who hand over their data to third parties.** The third-largest consumer of bandwidth (Amazonbot) and several others (Ahrefs, Semrush, Bing) are known as “good bots.” **These verified bots respect the `robots.txt` file, allowing you to control how and what they access.** A possible configuration for managing these bots is shown below:\n\n```\nUser-agent: googlebot\nUser-agent: bingbot\nAllow: /\nDisallow: /search\n\nUser-agent: *\nAllow: /$\nDisallow: /\n```\n\n_This allows Google and Bing bots to crawl your site, except for search pages, while restricting all other bots to the site’s root._\n\nThis setup grants broad access to valuable, known bots but limits overly aggressive crawling of all your site’s pages. However, notice how the second-highest bandwidth consumer is ClaudeBot—an **AI bot notorious for consuming large amounts of data while disregarding the `robots.txt` file.** In this new AI-driven world, we’re seeing more of these kinds of bots.\n\nAt deco.cx, we offer a standard `robots.txt` similar to the example above for our sites, but for bots that don’t respect this standard, the only way to control access is through blocking them at the CDN (in our case, Cloudflare). At Deco, we use three approaches to block these bots:\n\n- **Block by User-Agent**: Bots that ignore `robots.txt` but have a consistent user-agent can be blocked directly at our CDN.\n\n- **Challenge by ASN**: Some bots, especially malicious ones, come from networks known for such attacks. We place a challenge on these networks, making it difficult for machines to solve.\n\n- **Limit Requests by IP**: After a certain number of requests from a single origin, we present a challenge that users must solve correctly or face a temporary block.\n\nThese rules have effectively controlled most bots…\n\n…except Facebook.\n\n## “Facebook, Are You Okay?”\n\nWe’ve discussed bots that respect `robots.txt`. And then there’s Facebook.\n\nJust before Facebook’s new privacy policy went into effect—allowing user data to be used for AI training—we noticed a significant spike in the behavior of Facebook’s bot on our networks. This resulted in a substantial increase in data consumption, as shown in the graph below.\n\n[More details on Facebook’s new privacy policy](https://www.gov.br/anpd/pt-br/assuntos/noticias/anpd-determina-suspensao-cautelar-do-tratamento-de-dados-pessoais-para-treinamento-da-ia-da-meta)\n\n![Traffic June 2024](https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/206499a0-a2b2-4d49-9df0-c8f037a16101)\n\n_Aggregate data traffic for a set of sites in June 2024._\n\nThe Facebook bot typically fetches data when a link is shared on the platform, including details about the image and site information. However, we discovered that the bot wasn’t just fetching this data—it was performing a full crawl of sites, aggressively and without respecting `robots.txt`!\n\nMoreover, Facebook uses various IPv6 addresses, meaning the crawl doesn’t come from a single or a few IPs, making it difficult to block with our existing controls. \nWe didn’t want to block Facebook entirely, as this would disrupt sharing, but we also didn’t want to allow their bots to consume excessive resources. To address this, we implemented more specific control rules, limiting access across Facebook’s entire network…\n\n![Traffic July 2024](https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/9cdc728c-af5e-44dc-802b-6da8550cb207)\n_Aggregate data traffic for a set of sites in July 2024._\n\n…which proved to be highly effective.\n\n## Blocking Too Much Could Hurt Your Presence in Emerging Bots or Technologies\n\nA final word of caution: adopting an overly aggressive approach has its downsides. \nRestricting access to unknown bots might prevent new technologies and tools that could benefit your site from interacting with it. For example, a new AI that could recommend specific products to visitors might be inadvertently blocked. \nIt’s crucial to strike a balance, allowing selective bot access in line with market evolution and your business needs.\n\nIn summary, bots and crawlers are valuable allies, but managing their access requires strategic thinking. \nThe key is to allow only beneficial bots to interact with your site while staying alert to new technologies that might emerge. This balanced approach will ensure that your business maximizes return on traffic and resource consumption."}},"tags":[],"path":"bots","date":"08/15/2024","author":"Matheus Gaudêncio","img":"https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/971f5e4d-fc34-4b4e-98a3-fd09eeae079a"},{"img":"https://ozksgdmyrqcxcwhnbepg.supabase.co/storage/v1/object/public/assets/530/05ed5d60-22c1-459a-8323-53d3b2b3b3d9","body":{"en":{"descr":"How to refactor common design patterns from React into Native Web","title":"Leveraging native web APIs to reduce bundle size and improve performance","content":"\n\nIn modern web development, libraries like React have become the norm for building dynamic and interactive user interfaces. However, relying heavily on such libraries can sometimes lead to larger bundle sizes and reduced performance. By leveraging native web APIs, we can accomplish common design patterns more efficiently, enhancing performance and overall web compatibility. In this article, we will explore how to use these APIs to reduce our dependence on React hooks like `useState`.\n\n### Case Study: The Hidden `` Hack\n\nOne often overlooked technique in native web development is the hidden `` hack. This trick, which has been around for a while, offers a way to control UI state without relying on JavaScript, reducing bundle size and potentially improving performance.\n\n#### The React Approach\n\nLet’s start with a typical example in React:\n\n![hello](https://github.com/deco-cx/community/assets/1753396/6baf8c80-e11a-48fd-a611-cdf7405bfec2)\n\n\n```tsx\nimport { useState } from \"preact/hooks\";\n\nexport default function Section() {\n const [display, setDisplay] = useState(false);\n\n return (\n
\n \n {display &&
Hello!
}\n
\n );\n}\n```\n\nIn this example, the `useState` hook and the `onClick` handler are used to toggle the visibility of a piece of UI. While this approach is effective, it involves additional JavaScript, which can contribute to a larger bundle size.\n\n#### The Native Web API Approach\n\nNow, let’s refactor this example to use only native web APIs:\n\n```tsx\nexport default function Section() {\n return (\n
\n \n \n
Hello!
\n
\n );\n}\n```\n\n#### Explanation\n\nSo, what’s happening here? We’ve replaced the `useState` hook with a hidden `` element. Here’s a breakdown of the changes:\n\n1. **Hidden Checkbox**: We introduce an `` with an `id` and the `hidden` class. This checkbox will control the state.\n2. **Label for Toggle**: The `