Implementing your own middleware

The interface for middleware is defined in Middleware. There are 3 ways how middleware can be used: 1. Perform custom logic before request is processed (implement on_request method) 2. Perform custom logic before response is returned to the scraper logic (implement on_response method) 3. Provide some additional functionality a for the scraper implementation - scraper can call any middleware method using ScraperContext. Each middleware is added as an attribute to the passed context, so you can call it like context.<middleware_name>.<middleware_method>(...)

Middleware implementation example

On request middleware

Each request is wrapped in the Request class and you can modify its parameters before it’s dispatched, here’s the schema:

@dataclass
class Request:
    method: HttpMethod
    url: str
    headers: HttpHeaders | None = None
    kwargs: dict[str, Any] | None = None

Here’s the example of the middleware which logs each request URL:

import logging
from typing import Any

import aiohttp
from pydantic import BaseModel

from sneakpeek.middlewares.utils import parse_config_from_obj
from sneakpeek.scraper.model import Middleware, Request


# Each middleware can be configured, its configuration can be
# set globally for all requests or it can be overriden for
# specific scrapers
class MyLoggingMiddlewareConfig(BaseModel):
    some_param: str = "defaul value"

class MyMiddleware(BeforeRequestMiddleware):
  """Middleware description"""

  def __init__(self, default_config: MyLoggingMiddlewareConfig | None = None) -> None:
      self._default_config = default_config or MyLoggingMiddlewareConfig()
      self._logger = logging.getLogger(__name__)

  # The name property is mandatory, it's used in scraper config to override
  # middleware configuration for the given scraper
  @property
  def name(self) -> str:
      return "my_middleware"

  async def on_request(self, request: Request, config: Any | None) -> Request:
      # This converts freeform dictionary into a typed config (it's optional)
      config = parse_config_from_obj(
          config,
          self.name,
          MyLoggingMiddlewareConfig,
          self._default_config,
      )
      self._logger.info(f"Making {request.method.upper()} to {request.url}. {config.some_param}")
      return request

On response middleware

On response method recieves both request and response objects. Response is aiohttp.ClientResponse object.

Here’s the example of the middleware which logs each response body:

import logging
from typing import Any

import aiohttp
from pydantic import BaseModel

from sneakpeek.middleware.base import parse_config_from_obj
from sneakpeek.scraper.model import Middleware, Request


# Each middleware can be configured, its configuration can be
# set globally for all requests or it can be overriden for
# specific scrapers
class MyLoggingMiddlewareConfig(BaseModel):
    some_param: str = "defaul value"


class MyOnResponseMiddleware(Middleware):
  """Middleware description"""

  def __init__(self, default_config: MyLoggingMiddlewareConfig | None = None) -> None:
      self._default_config = default_config or MyLoggingMiddlewareConfig()
      self._logger = logging.getLogger(__name__)

  # The name property is mandatory, it's used in scraper config to override
  # middleware configuration for the given scraper
  @property
  def name(self) -> str:
      return "my_middleware"

  async def on_response(
      self,
      request: Request,
      response: aiohttp.ClientResponse,
      config: Any | None,
  ) -> aiohttp.ClientResponse:
      config = parse_config_from_obj(
          config,
          self.name,
          MyLoggingMiddlewareConfig,
          self._default_config,
      )
      response_body = await response.text()
      self._logger.info(f"Made {request.method.upper()} request to {request.url} - received: status={response.status} body={response_body}")
      return response

Functional middleware

If the middleware doesn’t need to interact with the request or response you can derive it from BaseMiddleware, so that both on_request and on_response method are implemented as pass-through.

Here’s an example of such implementation

import logging
from typing import Any

from sneakpeek.middleware.base import parse_config_from_obj, BaseMiddleware


class MyFunctionalMiddleware(BaseMiddleware):
  """Middleware description"""

  def __init__(self) -> None:
      self._logger = logging.getLogger(__name__)

  # The name property is mandatory, it's used in scraper config to override
  # middleware configuration for the given scraper
  @property
  def name(self) -> str:
      return "my_middleware"

  # This function will be available for scrapers by using
  # `context.my_middleware.custom_funct(some_arg)`
  def custom_func(self, arg1: Any) -> Any:
      return do_something(arg1)