I have data in JSON format containing millions of records that I want to insert into MongoDB database. I created a JAVA program that reads the JSON file, parses it and bulk inserts it to the MongoDB collection using the insertMany() method. Each bulk insert contains 10000 documents. Average size of the document is 13 kB. After inserting roughly about 300 000 documents to the collection, the performance of the inserts progressively starts slowing down. There are no indexes on the collection apart from the default one provided by MongoDB.
I have looked into the mongod.log to diagnose the problem and it looks like after the collection contains about 300 000 documents, every following bulk insert causes an aggregate command with COLLSCAN on the entire collection. After the collection contains 3 000 000 documents, the COLLSCAN took about 30 seconds. The time of the bulk insert operation itself does not change, staying at average 200 ms/10000 documents.