Skip to content

Commit

Permalink
fixing bug in posts
Browse files Browse the repository at this point in the history
  • Loading branch information
Prof-ThiagoOliveira committed Jul 10, 2024
1 parent fcfdaba commit 9c36336
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 20 deletions.
6 changes: 2 additions & 4 deletions content/post/2024-07-03-read_write_big_data/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,7 @@ generate_sample_data <- function(n) {
}

# Define dataset sizes
dataset_sizes <- c(1e2, 1e3

, 1e4, 1e5, 1e6, 1e7, 1e8)
dataset_sizes <- c(1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8)
```

## Benchmarking Read and Write Performance
Expand All @@ -272,7 +270,7 @@ setDTthreads(num_threads)
Thus, we defined functions for writing to and reading from each data format, incorporating multi-threading where supported. These functions were then used in the benchmarking process.

```r
# Define file writing and reading functions with threading support
# Definition of writing and reading functions with threading support
write_rds <- function(data, file) saveRDS(data, file)
write_dt <- function(data, file) data.table::fwrite(data, file, nThread = num_threads)
write_fst <- function(data, file) fst::write_fst(data, file)
Expand Down
6 changes: 2 additions & 4 deletions content/post/2024-07-03-read_write_big_data/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,7 @@ <h1>Data Simulation</h1>
}

# Define dataset sizes
dataset_sizes &lt;- c(1e2, 1e3

, 1e4, 1e5, 1e6, 1e7, 1e8)</code></pre>
dataset_sizes &lt;- c(1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8)</code></pre>
<div id="benchmarking-read-and-write-performance" class="section level2">
<h2>Benchmarking Read and Write Performance</h2>
<p>The benchmarking process involves measuring the time taken to read from and write to different data formats. We used the <code>microbenchmark</code> package in <code>R</code> to perform these measurements, ensuring that each operation is repeated 100 times to obtain reliable statistics. Additionally, we considered multi-threading capabilities where applicable, leveraging the maximum number of available cores to enhance performance.</p>
Expand All @@ -263,7 +261,7 @@ <h2>Benchmarking Read and Write Performance</h2>
# Setting threads for data.table
setDTthreads(num_threads)</code></pre>
<p>Thus, we defined functions for writing to and reading from each data format, incorporating multi-threading where supported. These functions were then used in the benchmarking process.</p>
<pre class="r"><code># Define file writing and reading functions with threading support
<pre class="r"><code># Definition of writing and reading functions with threading support
write_rds &lt;- function(data, file) saveRDS(data, file)
write_dt &lt;- function(data, file) data.table::fwrite(data, file, nThread = num_threads)
write_fst &lt;- function(data, file) fst::write_fst(data, file)
Expand Down
6 changes: 2 additions & 4 deletions content/post/2024-07-03-read_write_big_data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,7 @@ generate_sample_data <- function(n) {
}

# Define dataset sizes
dataset_sizes <- c(1e2, 1e3

, 1e4, 1e5, 1e6, 1e7, 1e8)
dataset_sizes <- c(1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8)
```

## Benchmarking Read and Write Performance
Expand All @@ -272,7 +270,7 @@ setDTthreads(num_threads)
Thus, we defined functions for writing to and reading from each data format, incorporating multi-threading where supported. These functions were then used in the benchmarking process.

```r
# Define file writing and reading functions with threading support
# Definition of writing and reading functions with threading support
write_rds <- function(data, file) saveRDS(data, file)
write_dt <- function(data, file) data.table::fwrite(data, file, nThread = num_threads)
write_fst <- function(data, file) fst::write_fst(data, file)
Expand Down
6 changes: 2 additions & 4 deletions public/index.xml
Original file line number Diff line number Diff line change
Expand Up @@ -489,9 +489,7 @@ img {
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Define dataset sizes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;dataset_sizes&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1e2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e6&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e7&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;dataset_sizes&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1e2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e6&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e7&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;m&#34;&gt;1e8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;benchmarking-read-and-write-performance&#34;&gt;Benchmarking Read and Write Performance&lt;/h2&gt;
&lt;p&gt;The benchmarking process involves measuring the time taken to read from and write to different data formats. We used the &lt;code&gt;microbenchmark&lt;/code&gt; package in &lt;code&gt;R&lt;/code&gt; to perform these measurements, ensuring that each operation is repeated 100 times to obtain reliable statistics. Additionally, we considered multi-threading capabilities where applicable, leveraging the maximum number of available cores to enhance performance.&lt;/p&gt;
&lt;p&gt;Multi-threading can significantly impact the performance of read and write operations. We determined the number of available threads using &lt;code&gt;parallel::detectCores()&lt;/code&gt; and configured the relevant functions to use this information. For instance, the &lt;code&gt;fwrite&lt;/code&gt; function from &lt;code&gt;data.table&lt;/code&gt; and &lt;code&gt;qsave&lt;/code&gt; from the &lt;code&gt;qs&lt;/code&gt; package support multi-threading, which we enabled in our benchmarking process.&lt;/p&gt;
Expand All @@ -501,7 +499,7 @@ img {
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Setting threads for data.table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nf&#34;&gt;setDTthreads&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;num_threads&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Thus, we defined functions for writing to and reading from each data format, incorporating multi-threading where supported. These functions were then used in the benchmarking process.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Define file writing and reading functions with threading support&lt;/span&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Definition of writing and reading functions with threading support&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;write_rds&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;function&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;saveRDS&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;write_dt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;function&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;data.table&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;nf&#34;&gt;fwrite&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;nThread&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;num_threads&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;write_fst&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;function&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fst&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;nf&#34;&gt;write_fst&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
Expand Down
6 changes: 2 additions & 4 deletions public/post/data-read-write-performance/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1993,9 +1993,7 @@ <h1 id="data-simulation">Data Simulation</h1>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Define dataset sizes</span>
</span></span><span class="line"><span class="cl"><span class="n">dataset_sizes</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="m">1e2</span><span class="p">,</span> <span class="m">1e3</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">,</span> <span class="m">1e4</span><span class="p">,</span> <span class="m">1e5</span><span class="p">,</span> <span class="m">1e6</span><span class="p">,</span> <span class="m">1e7</span><span class="p">,</span> <span class="m">1e8</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">dataset_sizes</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="m">1e2</span><span class="p">,</span> <span class="m">1e3</span><span class="p">,</span> <span class="m">1e4</span><span class="p">,</span> <span class="m">1e5</span><span class="p">,</span> <span class="m">1e6</span><span class="p">,</span> <span class="m">1e7</span><span class="p">,</span> <span class="m">1e8</span><span class="p">)</span>
</span></span></code></pre></div><h2 id="benchmarking-read-and-write-performance">Benchmarking Read and Write Performance</h2>
<p>The benchmarking process involves measuring the time taken to read from and write to different data formats. We used the <code>microbenchmark</code> package in <code>R</code> to perform these measurements, ensuring that each operation is repeated 100 times to obtain reliable statistics. Additionally, we considered multi-threading capabilities where applicable, leveraging the maximum number of available cores to enhance performance.</p>
<p>Multi-threading can significantly impact the performance of read and write operations. We determined the number of available threads using <code>parallel::detectCores()</code> and configured the relevant functions to use this information. For instance, the <code>fwrite</code> function from <code>data.table</code> and <code>qsave</code> from the <code>qs</code> package support multi-threading, which we enabled in our benchmarking process.</p>
Expand All @@ -2005,7 +2003,7 @@ <h1 id="data-simulation">Data Simulation</h1>
</span></span><span class="line"><span class="cl"><span class="c1"># Setting threads for data.table</span>
</span></span><span class="line"><span class="cl"><span class="nf">setDTthreads</span><span class="p">(</span><span class="n">num_threads</span><span class="p">)</span>
</span></span></code></pre></div><p>Thus, we defined functions for writing to and reading from each data format, incorporating multi-threading where supported. These functions were then used in the benchmarking process.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Define file writing and reading functions with threading support</span>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="c1"># Definition of writing and reading functions with threading support</span>
</span></span><span class="line"><span class="cl"><span class="n">write_rds</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span> <span class="nf">saveRDS</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">write_dt</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span> <span class="n">data.table</span><span class="o">::</span><span class="nf">fwrite</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">,</span> <span class="n">nThread</span> <span class="o">=</span> <span class="n">num_threads</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">write_fst</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span> <span class="n">fst</span><span class="o">::</span><span class="nf">write_fst</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span>
Expand Down

0 comments on commit 9c36336

Please sign in to comment.