-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs(guide): convert /tutorias/gridfs/* to rst
- Loading branch information
1 parent
31a5c9b
commit 295ea4c
Showing
1 changed file
with
207 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,210 @@ | ||
====== | ||
GridFS | ||
====== | ||
|
||
:manual:`GridFS </core/gridfs/>` is a specification for storing and | ||
retrieving files that exceed the :manual:`BSON-document size limit </reference/limits/#limit-bson-document-size>` | ||
of 16 megabytes. | ||
|
||
Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, | ||
and stores each of those chunks as a separate document. By default, GridFS limits chunk size | ||
to 255 kilobytes. GridFS uses two collections to store files: the ``chunks`` collection which | ||
stores the file chunks, and the ``files`` collection that stores the file metadata. | ||
|
||
When you query a GridFS store for a file, the driver or client will reassemble the chunks as | ||
needed. GridFS is useful not only for storing files that exceed 16 megabytes but also for | ||
storing any files which you want to access without having to load the entire file into memory. | ||
|
||
The Node Driver supports GridFS with an api that is compatible with | ||
`Node Streams <https://nodejs.org/dist/latest/docs/api/stream.html>`_ , so you can ``.pipe()`` | ||
directly from file streams to MongoDB. In this tutorial, you will see how to use the GridFS | ||
streaming API to upload | ||
`a CC-licensed 28 MB recording of the overture from Richard Wagner's opera *Die Meistersinger von Nurnberg* <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_ | ||
to MongoDB using streams. | ||
|
||
Uploading a File | ||
---------------- | ||
|
||
You can use GridFS to upload a file to MongoDB. This example | ||
assumes that you have a file named ``meistersinger.mp3`` in the | ||
root directory of your project. You can use whichever file you want, or you | ||
can just download a `\ *Die Meistersinger* Overture mp3 <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_. | ||
|
||
In order to use the streaming GridFS API, you first need to create | ||
a ``GridFSBucket``. | ||
|
||
.. code-block:: js | ||
|
||
const { MongoClient, GridFSBucket } = require('mongodb'); | ||
const { createReadStream, createWriteStream } = require('fs'); | ||
const { pipeline } = require('stream'); | ||
const { promisify } = require('util'); | ||
|
||
// Allows us to use async/await with streams | ||
const pipelineAsync = promisify(pipeline); | ||
|
||
const uri = 'mongodb://localhost:27017'; | ||
|
||
const client = new MongoClient(uri); | ||
|
||
async function main(client) { | ||
const db = client.db('test'); | ||
const bucket = new GridFSBucket(db); | ||
} | ||
|
||
// Function to connect to the server and run your code | ||
async function run() { | ||
try { | ||
// Connect the client to the server | ||
await client.connect(); | ||
console.log('Connected successfully to server'); | ||
|
||
await main(client); | ||
} finally { | ||
// Ensures that the client will close when you finish/error | ||
await client.close(); | ||
} | ||
} | ||
|
||
// Runs your code | ||
run(); | ||
|
||
|
||
The bucket has an ``openUploadStream()`` method that creates an upload stream for a given | ||
file name. You can pipe a Node.js ``fs`` read stream to the upload stream. | ||
|
||
.. code-block:: js | ||
|
||
async function main(client) { | ||
const db = client.db('test'); | ||
const bucket = new GridFSBucket(db); | ||
|
||
await pipelineAsync( | ||
createReadStream('./meistersinger.mp3'), | ||
bucket.openUploadStream('meistersinger.mp3') | ||
); | ||
console.log('done!'); | ||
} | ||
|
||
Assuming that your ``test`` database was empty, you should see that the above | ||
script created 2 collections in your ``test`` database: ``fs.chunks`` and | ||
``fs.files``. The ``fs.files`` collection contains high-level metadata about | ||
the files stored in this bucket. For instance, the file you just uploaded | ||
has a document that looks like what you see below. | ||
|
||
.. code-block:: js | ||
|
||
> db.fs.files.findOne() | ||
{ | ||
"_id" : ObjectId("561fc381e81346c82d6397bb"), | ||
"length" : 27847575, | ||
"chunkSize" : 261120, | ||
"uploadDate" : ISODate("2015-10-15T15:17:21.819Z"), | ||
"md5" : "2459f1cdec4d9af39117c3424326d5e5", | ||
"filename" : "meistersinger.mp3" | ||
} | ||
|
||
The above document indicates that the file is named 'meistersinger.mp3', and tells | ||
you its size in bytes, when it was uploaded, and the | ||
`md5 <https://en.wikipedia.org/wiki/MD5>`_ of the contents. There's also a | ||
``chunkSize`` field indicating that the file is | ||
broken up into chunks of size 255 kilobytes, which is the | ||
default. | ||
|
||
.. code-block:: js | ||
|
||
> db.fs.chunks.count() | ||
107 | ||
|
||
Not surprisingly, 27847575/261120 is approximately 106.64, so the ``fs.chunks`` | ||
collection contains 106 chunks with size 255KB and 1 chunk that's roughly | ||
255KB * 0.64. Each individual chunks document is similar to the document below. | ||
|
||
.. code-block:: js | ||
|
||
> db.fs.chunks.findOne({}, { data: 0 }) | ||
{ | ||
"_id" : ObjectId("561fc381e81346c82d6397bc"), | ||
"files_id" : ObjectId("561fc381e81346c82d6397bb"), | ||
"n" : 0 | ||
} | ||
|
||
The chunk document keeps track of which file it belongs to and its order in | ||
the list of chunks. The chunk document also has a ``data`` field that contains | ||
the raw bytes of the file. | ||
|
||
You can configure both the chunk size and the ``fs`` prefix for the files and | ||
chunks collections at the bucket level. For instance, if you specify the | ||
``chunkSizeBytes`` and ``bucketName`` options as shown below, you'll get | ||
27195 chunks in the ``songs.chunks`` collection. | ||
|
||
.. code-block:: js | ||
|
||
async function main(client) { | ||
const db = client.db('test'); | ||
const bucket = new GridFSBucket(db, { | ||
chunkSizeBytes: 1024, | ||
bucketName: 'songs' | ||
}); | ||
|
||
await pipelineAsync( | ||
createReadStream('./meistersinger.mp3'), | ||
bucket.openUploadStream('meistersinger.mp3') | ||
); | ||
console.log('done!'); | ||
} | ||
|
||
Downloading a File | ||
------------------ | ||
|
||
Congratulations, you've successfully uploaded a file to MongoDB! However, | ||
a file sitting in MongoDB isn't particularly useful. In order to stream the | ||
file to your hard drive, an HTTP response, or to npm modules like | ||
`speaker <https://www.npmjs.com/package/speaker>`_\ , you're going to need | ||
a download stream. The easiest way to get a download stream is | ||
the ``openDownloadStreamByName()`` method. | ||
|
||
.. code-block:: js | ||
|
||
async function main(client) { | ||
const db = client.db('test'); | ||
const bucket = new GridFSBucket(db, { | ||
chunkSizeBytes: 1024, | ||
bucketName: 'songs' | ||
}); | ||
|
||
await pipelineAsync( | ||
bucket.openDownloadStreamByName('meistersinger.mp3'), | ||
createWriteStream('./output.mp3') | ||
); | ||
console.log('done!'); | ||
} | ||
|
||
Now, you have an ``output.mp3`` file that's a copy of the original | ||
``meistersinger.mp3`` file. The download stream also enables you to do some | ||
neat tricks. For instance, you can cut off the beginning of the song by | ||
specifying a number of bytes to skip. You can cut off the first 41 seconds of | ||
the mp3 and skip right to the good part of the song as shown below. | ||
|
||
.. code-block:: js | ||
|
||
|
||
async function main(client) { | ||
const db = client.db('test'); | ||
const bucket = new GridFSBucket(db, { | ||
chunkSizeBytes: 1024, | ||
bucketName: 'songs' | ||
}); | ||
|
||
await pipelineAsync( | ||
bucket.openDownloadStreamByName('meistersinger.mp3').start(1024 * 1585), | ||
createWriteStream('./output.mp3') | ||
); | ||
console.log('done!'); | ||
} | ||
|
||
An important point to be aware of regarding performance is that the GridFS | ||
streaming API can't load partial chunks. When a download stream needs to pull a | ||
chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default | ||
chunk size is usually sufficient, but you can reduce the chunk size to reduce | ||
memory overhead. |