Skip to content

Requirements for different archive sizes

Thomas Egense edited this page Oct 23, 2024 · 1 revision

Here follows a short overview of hardware requirements for a fresh SolrWayback setup. This wiki page needs elaboration.

0-100GB of WARCs

Index workflow, search engine and frontend should be able to run using a total of 4GB of RAM on just about any current machine. In case of crash: Reindex.

100GB-1TB of WARCs

SSD highly recommended, 4 CPU's, 8GB of RAM (need to test this - might need 10-12), single machine setup or 2 machines for redundancy, WARC index logistics from command line

1TB-50TB of WARCs, single collection

SSD essential, RAM for caching, separation of index & search, multi machine, fully live index, WARC index logistics possible from command line but consider Hadoop/netsearch/generic workflow engine

1TB-50TB of WARCs, multi collection

Same as single collection, but consider freezing finished collections

50TB-1PB of WARCs

As above, but automated logistics system, freezing of finished collections and highly recommended, focus on Solr sharding practical limitations

2PB-10PB of WARCs

If everything is to be searched in the same cloud, strong focus on freezing and minimizing of shard/collection count vs. single shard size maximum om ~1TB is needed

10PB+ of WARCs

Uncharted territory. Trivial to do by using multiple separate clouds, but hard if full corpus search is needed. Can be helped by compromising on indexed text size and features.

Clone this wiki locally