MongoDB-TV

GridFS + Full text search

Created by Matias Cascallares

Who am I

  • Originally from Buenos Aires, Argentina
  • Moved several times in the last 5 years: Buenos Aires, Barcelona, Madrid, Amsterdam, Menorca, Singapore
  • Worked for 10 years on web development (mostly)
  • I used to be a Java guy, then I started to work more with Python, PHP and Javascript
  • Right now working for MongoDB and based in Singapore

A GridFS overview

Specification to store files inside MongoDB

Nothing new!

  • fs.files
  • fs.chunks
  • fs.files

    
    var file = {
    	_id : ObjectId("52669591c3d732d92d000002"),
    	filename : "myFile.mov",
    	contentType : "binary/octet-stream",
    	length : 160722190,
    	chunkSize : 8388608,
    	uploadDate : ISODate("2013-10-22T15:11:22.138Z"),
    	aliases : null,
    	metadata : {},
    	md5 : "85217913f05c231b1eac9e1c1492971f"
    }
    						

    fs.chunks

    
    var chunk = {
    	_id : ObjectId("52669591c3d732d92d000003"),
    	n:0,
    	files_id: ObjectId("52669591c3d732d92d000002"),
    	data: '' //binary
    }
    						
    
    var file = {
    	_id : ObjectId("52669591c3d732d92d000002"),
    	filename : "myFile.mov",
    	contentType : "binary/octet-stream",
    	length : 160722190,
    	chunkSize : 8388608,
    	uploadDate : ISODate("2013-10-22T15:11:22.138Z"),
    	aliases : null,
    	metadata : {},
    	md5 : "85217913f05c231b1eac9e1c1492971f"
    };
    						
    
    var chunk0 = {
    	_id : ObjectId("52669591c3d732d92d000003"),
    	n:0,
    	files_id: ObjectId("52669591c3d732d92d000002"),
    	data: '' //binary
    };
    var chunk1 = {
    	_id : ObjectId("52669591c3d732d92d000004"),
    	n:1,
    	files_id: ObjectId("52669591c3d732d92d000002"),
    	data: '' //binary
    };
    
    						

    MongoDB - TV

    Inception

    • Bootcamp project
    • A nodejs web application
    • Store any video into MongoDB using GridFS
    • Stream them to the browser
    • Store subtitles to be able to search into them using Full Text Search

    Schema

  • Video file -> GridFS
  • Show and episode metadata -> Document
  • Subtitles -> Document
  • Show and episode metadata

    
    var show = {
        _id: ObjectId("528335aecfc1b246d7686c94");
        name: "M102 for DBAs",
        episodes: [
            {
                _id: ObjetId("528335aecfc1b246d7686c95"),
                created: Date(),
                season: 4,
                number: 3,
                video: "filename1.mp4", // filename in GridFS
            },
            // ...
        ]
    };
    						

    Subtitle metadata

    
    var subtitle = {
        _id: ObjectId("528335aecfc1b246d7686c96");
        episode: ObjectId('7834634dca45'), // refers video in show.episode._id
        start: 1234,
        end: 1250,
        text: 'A replica set in MongoDB is a group of mongod processes...'
    };
    
    subtitleSchema.index({text : 'text'});
    
    						

    Some considerations

    Enable text search

    Text search is not enabled by default in 2.4.x, you have to force it

    
    mongod --dbpath data --setParameter textSearchEnabled=true --fork
    						

    GridFS chunk size

    • 256K by default: 300 MB file -> 1200 docs
    • Increased to 8M: 300 MB file -> 38 docs

    HTTP is your friend

    Use GridFS information to provide to the client all kind of mechanisms for caching, expiration and validation to reduce database load

    HTTP is your friend

    
    if (metadata && stream) {
        res.status(200);
        res.set({
            'Accept-Ranges': 'bytes',
            'Content-Type': metadata.contentType,
            'Content-Length': metadata.length,
            'Etag': metadata.md5,
            'Last-Modified': metadata.uploadDate
        });
        stream.pipe(res);
    } else {
        res.status(404).send('Not found');
    }
        					

    Performance

    Source code

    https://github.com/mcascallares/MongoDB-TV

    DISCLAIMER: This is a hackathon-style project, is not the best code you can find out there

    with a little help from my friends

    THE END

    Thanks