docs(guide): convert /tutorias/gridfs/* to rst

Fixes NODE-2206 Fixes NODE-2207
mongodb · Oct 24, 2019 · 295ea4c · 295ea4c
1 parent 31a5c9b
commit 295ea4c
Showing 1 changed file with 207 additions and 0 deletions.
diff --git a/docs/guide/tutorials/gridfs.txt b/docs/guide/tutorials/gridfs.txt
@@ -1,3 +1,210 @@
 ======
 GridFS
 ======
+
+:manual:`GridFS </core/gridfs/>` is a specification for storing and 
+retrieving files that exceed the :manual:`BSON-document size limit </reference/limits/#limit-bson-document-size>` 
+of 16 megabytes.
+
+Instead of storing a file in a single document, GridFS divides a file into parts, or chunks,
+and stores each of those chunks as a separate document. By default, GridFS limits chunk size
+to 255 kilobytes. GridFS uses two collections to store files: the ``chunks`` collection which
+stores the file chunks, and the ``files`` collection that stores the file metadata.
+
+When you query a GridFS store for a file, the driver or client will reassemble the chunks as 
+needed. GridFS is useful not only for storing files that exceed 16 megabytes but also for
+storing any files which you want to access without having to load the entire file into memory.
+
+The Node Driver supports GridFS with an api that is compatible with
+`Node Streams <https://nodejs.org/dist/latest/docs/api/stream.html>`_ , so you can ``.pipe()`` 
+directly from file streams to MongoDB. In this tutorial, you will see how to use the GridFS
+streaming API to upload
+`a CC-licensed 28 MB recording of the overture from Richard Wagner's opera *Die Meistersinger von Nurnberg* <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_
+to MongoDB using streams.
+
+Uploading a File
+----------------
+
+You can use GridFS to upload a file to MongoDB. This example
+assumes that you have a file named ``meistersinger.mp3`` in the
+root directory of your project. You can use whichever file you want, or you
+can just download a `\ *Die Meistersinger* Overture mp3 <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_.
+
+In order to use the streaming GridFS API, you first need to create
+a ``GridFSBucket``.
+
+.. code-block:: js
+
+   const { MongoClient, GridFSBucket } = require('mongodb');
+   const { createReadStream, createWriteStream } = require('fs');
+   const { pipeline } = require('stream');
+   const { promisify } = require('util');
+
+   // Allows us to use async/await with streams
+   const pipelineAsync = promisify(pipeline);
+
+   const uri = 'mongodb://localhost:27017';
+
+   const client = new MongoClient(uri);
+
+   async function main(client) {
+       const db = client.db('test');
+       const bucket = new GridFSBucket(db);
+   }
+
+   // Function to connect to the server and run your code
+   async function run() {
+     try {
+       // Connect the client to the server
+       await client.connect();
+       console.log('Connected successfully to server');
+
+       await main(client);
+     } finally {
+       // Ensures that the client will close when you finish/error
+       await client.close();
+     }
+   }
+
+   // Runs your code
+   run();
+
+
+The bucket has an ``openUploadStream()`` method that creates an upload stream for a given
+file name. You can pipe a Node.js ``fs`` read stream to the upload stream.
+
+.. code-block:: js
+
+   async function main(client) {
+     const db = client.db('test');
+     const bucket = new GridFSBucket(db);
+
+     await pipelineAsync(
+       createReadStream('./meistersinger.mp3'),
+       bucket.openUploadStream('meistersinger.mp3')
+     );
+     console.log('done!');
+   }
+
+Assuming that your ``test`` database was empty, you should see that the above
+script created 2 collections in your ``test`` database: ``fs.chunks`` and
+``fs.files``. The ``fs.files`` collection contains high-level metadata about
+the files stored in this bucket. For instance, the file you just uploaded
+has a document that looks like what you see below.
+
+.. code-block:: js
+
+   > db.fs.files.findOne()
+   {
+       "_id" : ObjectId("561fc381e81346c82d6397bb"),
+       "length" : 27847575,
+       "chunkSize" : 261120,
+       "uploadDate" : ISODate("2015-10-15T15:17:21.819Z"),
+       "md5" : "2459f1cdec4d9af39117c3424326d5e5",
+       "filename" : "meistersinger.mp3"
+   }
+
+The above document indicates that the file is named 'meistersinger.mp3', and tells
+you its size in bytes, when it was uploaded, and the
+`md5 <https://en.wikipedia.org/wiki/MD5>`_ of the contents. There's also a
+``chunkSize`` field indicating that the file is
+broken up into chunks of size 255 kilobytes, which is the
+default.
+
+.. code-block:: js
+
+   > db.fs.chunks.count()
+   107
+
+Not surprisingly, 27847575/261120 is approximately 106.64, so the ``fs.chunks``
+collection contains 106 chunks with size 255KB and 1 chunk that's roughly
+255KB * 0.64. Each individual chunks document is similar to the document below.
+
+.. code-block:: js
+
+   > db.fs.chunks.findOne({}, { data: 0 })
+   {
+       "_id" : ObjectId("561fc381e81346c82d6397bc"),
+       "files_id" : ObjectId("561fc381e81346c82d6397bb"),
+       "n" : 0
+   }
+
+The chunk document keeps track of which file it belongs to and its order in
+the list of chunks. The chunk document also has a ``data`` field that contains
+the raw bytes of the file.
+
+You can configure both the chunk size and the ``fs`` prefix for the files and
+chunks collections at the bucket level. For instance, if you specify the
+``chunkSizeBytes`` and ``bucketName`` options as shown below, you'll get
+27195 chunks in the ``songs.chunks`` collection.
+
+.. code-block:: js
+
+   async function main(client) {
+     const db = client.db('test');
+     const bucket = new GridFSBucket(db, {
+       chunkSizeBytes: 1024,
+       bucketName: 'songs'
+     });
+
+     await pipelineAsync(
+       createReadStream('./meistersinger.mp3'),
+       bucket.openUploadStream('meistersinger.mp3')
+     );
+     console.log('done!');
+   }
+
+Downloading a File
+------------------
+
+Congratulations, you've successfully uploaded a file to MongoDB! However,
+a file sitting in MongoDB isn't particularly useful. In order to stream the
+file to your hard drive, an HTTP response, or to npm modules like
+`speaker <https://www.npmjs.com/package/speaker>`_\ , you're going to need
+a download stream. The easiest way to get a download stream is
+the ``openDownloadStreamByName()`` method.
+
+.. code-block:: js
+
+   async function main(client) {
+     const db = client.db('test');
+     const bucket = new GridFSBucket(db, {
+       chunkSizeBytes: 1024,
+       bucketName: 'songs'
+     });
+
+     await pipelineAsync(
+       bucket.openDownloadStreamByName('meistersinger.mp3'),
+       createWriteStream('./output.mp3')
+     );
+     console.log('done!');
+   }
+
+Now, you have an ``output.mp3`` file that's a copy of the original
+``meistersinger.mp3`` file. The download stream also enables you to do some
+neat tricks. For instance, you can cut off the beginning of the song by
+specifying a number of bytes to skip. You can cut off the first 41 seconds of
+the mp3 and skip right to the good part of the song as shown below.
+
+.. code-block:: js
+
+
+   async function main(client) {
+     const db = client.db('test');
+     const bucket = new GridFSBucket(db, {
+       chunkSizeBytes: 1024,
+       bucketName: 'songs'
+     });
+
+     await pipelineAsync(
+       bucket.openDownloadStreamByName('meistersinger.mp3').start(1024 * 1585),
+       createWriteStream('./output.mp3')
+     );
+     console.log('done!');
+   }
+
+An important point to be aware of regarding performance is that the GridFS
+streaming API can't load partial chunks. When a download stream needs to pull a
+chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default
+chunk size is usually sufficient, but you can reduce the chunk size to reduce
+memory overhead.