Searching through massive files

Dust0741@lemmy.world · 2 months ago

Searching through massive files

irotsoma@lemmy.blahaj.zone · 2 months ago

I’ve used java Scanner objects to do this extremely efficiently with minimal memory required even with multiple parallel searches. Indexing is only necessary if you want to search for information many times and don’t know what exactly the search will be. For one time searches, it’s not going to be useful. Grep honestly is going to be faster and more efficient for most one time searches.

The initial indexing or searching of the files will be bottlenecked by the speed of the disk the files are on, no matter what you do. It only helps to index because you can move future searches to faster memory.

So it greatly depends on what and how often you need to search and the tradeoff is memory usage, but only for multiple searches of data you choose to index from the files in the first pass.