Deploying ONNX Runtime Web
This document provides some guidance on how to deploy ONNX Runtime Web in a production environment.
Contents
Assets
When deploying ONNX Runtime Web in a production environment, the following assets are required:
-
JavaScript code bundle: The JavaScript code bundle that contains the application code and maybe the ONNX Runtime Web JavaScript code as well, depending on the how the application is built.
-
WebAssembly binaries: The WebAssembly binary file(s) of ONNX Runtime Web library.
-
Model file(s): The ONNX model file(s) that you want to run in the browser.
JavaScript code bundle
The JavaScript code bundle is usually a minified JavaScript file that contains the application code, generated by a bundler such as Webpack, Rollup or ESBuild. Depending on the bundler’s configuration, the ONNX Runtime Web JavaScript code may be included in the bundle or not (if specified as an external dependency).
Conditional Importing
To reduce the size of the JavaScript code bundle, you can use Conditional Importing to import only the necessary parts of ONNX Runtime Web library. For example, you can import onnxruntime-web/wasm
if you only uses the WebAssembly execution provider, which can reduce the size of the JavaScript code bundle.
Worker loading
There are 2 workers in ONNX Runtime Web that can be loaded at runtime:
- the web worker for proxy feature. The ONNX Runtime Web JavaScript code can be loaded as the entry of the web worker of proxy feature.
- the web worker for WebAssembly multi-threading feature. The Emscripten generated JavaScript files can be loaded as the entry of the web worker for WebAssembly multi-threading feature.
When deployed in same-origin environments, the workers can be loaded directly from the script URL. This makes the workers being able to load in Content Security Policy (CSP) restricted environments. When deployed in cross-origin environments, for example, loading the workers from a CDN, the workers cannot be loaded directly from the script URL due to the same-origin policy. In this case, a fetch
will be performed and the workers will be loaded on the object URL that created from the response of the fetch.
WebAssembly binaries
The standard ONNX Runtime Web library includes the following WebAssembly binary files:
File | SIMD | Multi-threading | JSEP | Training |
---|---|---|---|---|
ort-wasm-simd-threaded.wasm | ✔️ | ✔️ | ❌ | ❌ |
ort-wasm-simd-threaded.jsep.wasm | ✔️ | ✔️ | ✔️ | ❌ |
ort-training-wasm-simd-threaded.wasm | ✔️ | ✔️ | ❌ | ✔️ |
The columns indicate whether the feature is supported by the WebAssembly artifact.
- SIMD: whether the Single Instruction, Multiple Data (SIMD) feature is supported.
- Multi-threading: whether the WebAssembly multi-threading feature is supported.
- JSEP: whether the JavaScript Execution Provider (JSEP) feature is enabled. This feature powers the WebGPU and WebNN execution providers.
- Training: whether the training feature is enabled.
When deploying ONNX Runtime Web in a production environment, you should consider which WebAssembly binary file(s) to include in the application. Here are some considerations:
- When using training feature, the
ort-training-wasm-simd-threaded.wasm
file is used. - When using WebGPU or WebNN execution provider, the
ort-wasm-simd-threaded.jsep.wasm
file is used. - Otherwise, the
ort-wasm-simd-threaded.wasm
file is used.
Ensure the WebAssembly binary file(s) are correctly served
You should ensure that the WebAssembly binary file(s) are correctly served on the server. If you didn’t copy the necessary WebAssembly binary file(s) when building the application, or if the WebAssembly binary file(s) are not in the expected path, ONNX Runtime Web will fail to initialize.
Override WebAssembly file path
ONNX Runtime Web tries to locate the WebAssembly binary file(s) by using the relative path of the JavaScript code bundle. If the WebAssembly binary file(s) are not located in the same directory as the JavaScript code bundle, you can override the file path by setting the value of ort.env.wasm.wasmPaths
.
You can also set the ort.env.wasm.wasmPaths
to an absolute URL to a public CDN, like jsdelivr or unpkg, if you are using a release version of ONNX Runtime Web:
// Set the WebAssembly binary file path to jsdelivr CDN for latest dev version
ort.env.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/onnxruntime-web@dev/dist/';
// Set the WebAssembly binary file path to unpkg CDN for latest dev version
ort.env.wasm.wasmPaths = 'https://unpkg.com/onnxruntime-web@dev/dist/';
See API reference: env.wasm.wasmPaths for more details.
Model file(s)
If your ONNX model file(s) are large and they need some time to download, you can consider to use IndexedDB to cache the model file(s) to avoid loading the model every time the page is refreshed.
If the model contains external data, you need to pass the external data information to ONNX Runtime Web. See External Data for more details.
File size considerations
The size of the artifacts is an important factor to consider when deploying ONNX Runtime Web in a production environment. Reducing the file size can improve the load time of the application and reduce the memory consumption on the client’s device.
To reduce the deployment size, you can consider the following options:
- Use Conditional Importing to import only the necessary parts of ONNX Runtime Web library.
- Serve only necessary WebAssembly binaries, or use the
ort.env.wasm.wasmPaths
to set the WebAssembly binary file path to a public CDN.
If you want ultimate control over the size of the artifacts, you can also perform a custom build of ONNX Runtime Web.
Custom build
By using a custom build of ONNX Runtime Web, you can build ONNX Runtime Web with only the kernels that required by your model, which can significantly reduce the size of the WebAssembly binary file(s). The steps are however more complex and require some knowledge of the ONNX Runtime Web build system.
The content of this part is under construction.
Security considerations
Secure Context
WebGPU is accessible only to secure contexts. In short, a page loaded using HTTPS or using HTTP from localhost/127.0.0.1 is considered secure context.
See Secure Context and WebGPU: Troubleshooting tips and fixes for more details.
Content Security Policy (CSP) restricted environments
Since ONNX Runtime Web v1.19, the WebAssembly binary file(s) and workers can be loaded in CSP restricted environments. Necessary artifacts need to be served to make it work.