Deployment options

There are multiple options how you can deploy your scrapers depending on your requirements:

One replica that does it all

This is a good option if:

  • you can tolerate some downtime

  • you don’t need to host thousands of scrapers that can be dynamically changed by users

  • you don’t care if you lose the information about the scraper jobs

In this case all you need to do is to:

  • define a list of scrapers in the code (just like in the tutorial)

  • use in-memory storage

Using external storage

If you use some external storage (e.g. redis or RDBMS) for jobs queue and lease storage you’ll be able:

  • to scale workers horizontally until queue, storage or scheduler becomes a bottleneck

  • to have a secondary replicas for the scheduler, so when primary dies for some reason there are fallback options

If you also use the external storage as a scrapers storage you’ll be able to dynamically add, delete and update scrapers via UI or JsonRPC API.

Note that each Sneakpeek server by default runs worker, scheduler and API services, but it’s possible to run only one role at the time, therefore you’ll be able to scale services independently.