inference_server

Pluggable Python HTTP web service (WSGI) for real-time AI/ML model inference compatible with Amazon SageMaker

class BatchStrategy(value)[source]

Bases: Enum

Enumeration of Batch Transform invocation strategies

Specifies the number of records to include in a mini-batch for an HTTP inference request. A record is a single unit of input data that inference can be made on. For example, a single line in a CSV file is a record.

See: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html#sagemaker-CreateTransformJob-request-BatchStrategy

MULTI_RECORD = 'MultiRecord'

Batch Transform job to invoke the model with multiple records per request

SINGLE_RECORD = 'SingleRecord'

Batch Transform job to invoke the model with a single record per request

class MIMEAccept(values: Accept | Iterable[tuple[str, float]] | None = ())[source]

Bases: Accept

Like Accept but with special methods and behavior for mimetypes.

property accept_html: bool

True if this object accepts HTML.

property accept_json: bool

True if this object accepts JSON.

property accept_xhtml: bool

True if this object accepts XHTML.

create_app() WSGIApplication[source]

Initialize and return the WSGI application

This is the WSGI application factory function that needs to be passed to a WSGI-compatible web server.

warmup() None[source]

Initialize any additional resources upfront

This will call the model_fn plugin hook.