Change log#
mobu is versioned with semver. Dependencies are updated to the latest available version during each release. Those changes are not noted here explicitly.
Find changes for the upcoming release in the project’s changelog.d.
16.0.0 (2025-07-10)#
Backwards-incompatible changes#
- The GitHub Mobu CI app now listens to - pull_requestevents instead of- check_suiteand- check_runevents. When this version of mobu is deployed to an environment, the app permissions and subscribed events in the- Permission & Eventstab of the Developer Settings for the app need to be modified.- The permissions in the - Repository permissionsaccordian section need to be changed:- In the - Pull requestsrow, change the- Accessdrop-down to- Read-only
 - The events in - Subscribe to eventssection of the- Permissions & Eventstab have to be changed:- Uncheck the - Check runbox
- Uncheck the - Check suitebox
- Check the - Pull requestbox
 - This fixes a bug where a Mobu CI job would never run when a PR was opened. 
15.4.0 (2025-07-02)#
New features#
- Support setting supplemental groups for users, allowing mobu to test services that use group membership for access control. 
Other changes#
- Use uv to maintain frozen dependencies and set up a development environment. 
15.3.1 (2025-06-30)#
Bug fixes#
- Notebook cache now only re-clones once after invalidation. Previously, it could get into a re-clone loop. 
- GitHub refresh doesn’t break now for flocks with multiple monkeys running repos with multiple notebooks. 
15.3.0 (2025-06-26)#
New features#
- Make notebook filtering simpler and more flexible. - Deprecate - exclude_dirs
- Add - collection_rulesto every notebookrunner business, not just- NotebookRunnerList
 - collection_ruleslooks like this:- collection_rules: - type: "exclude_union_of" patterns: - "not-these/**" - "not/these/either/**" - type: "intersect_union_of" patterns: - "this.ipynb" - "these/**" - "also/these**" - type: "intersect_union_of" patterns: - "**/these-*" - Each entry is a pattern using the Python pathlib glob pattern language 
- Start with all notebooks in the repo. 
- For each collection rule, remove notebooks: - Intersect rules will remove notebooks that are not in the intersection of: - The current set 
- The union of the matched patterns. 
 
- Exclude rules will remove notebooks from the current set that are in the union of the matched patterns. 
 
- Remove any remaining notebooks that require unavailable services. 
15.2.0 (2025-05-20)#
New features#
- Add a new - notebook_idle_timeconfig parameter to all- NotebookRunner-based to configure how long to wait in between each notebook execution.
15.1.0 (2025-03-20)#
New features#
- Multiple replicas of mobu can be run in an environment. The number of monkeys specified in a flock’s - countwill be spread evenly across all replicas. Note that this comes with certain restrictions, see the docs for details.
Bug fixes#
- Do not send a summary message to Slack if there are no flocks running. 
- One period of - execution_idle_timeelapses after trying to execute a notebook with no code cells, instead of not waiting at all.
15.0.0 (2025-03-17)#
Backwards-incompatible changes#
- The - NotebookRunnerbuisiness has been split into two different businesses:- NotebookRunnerCountingand- NotebookRunnerList. The difference is that- NotebookRunnerCountingtakes the- max_executionsoption that refreshes the lab after that number of notebook executions, and- NotebookRunnerListtakes the- notebooks_to_runoption, which runs all of the notebooks in that list before refreshing. Currently,- NotebookRunnerListis only used by the GitHub CI functionality. Any references to- NotebookRunnerin any flock config need to be changed to to one of these new businesses, almost certainly- NotebookRunnerCounting.
New features#
- Add support for running against Nublado configured with user subdomains. 
- Add a - gafaelfawr_timeoutconfig option. With very large numbers of users, like for scale testing, the default httpx timeouts from the safir http client may not be long enough.
- Add new - NotebookRunnerInfinitebusiness that does not interact further with JupyterHub after the notebook has been spawned, avoiding the pings mobu normally uses to refresh authentication credentials. This is a closer match to the typical access pattern for a regular user.
Bug fixes#
- Batch Gafaelfawr token creations in groups of 10 instead of attempting to perform them all in parallel. Gafaelfawr has to serialize them on database transactions anyway, so running all token creations at once with a large flock causes problems with HTTP request timeouts. 
14.2.1 (2025-02-26)#
Bug fixes#
- Avoid an unbound variable exception during a Nublado client error handling path. 
14.2.0 (2025-02-26)#
New features#
- Add SIAv2 QuerySet runner, which uses - pyvo.searchto query the DP0.2 SIAv2 service.
Bug fixes#
- CI jobs will now run all notebooks included in the PR, not just the ones changed in the latest commit. This fixes the case where the latest commit only fixes one of multiple bad notebooks in a PR, but passes the Mobu CI check. 
14.1.0 (2025-02-20)#
New features#
- All time durations in business configurations can now be given as human-readable durations with suffixes such as - h,- m, and- s. For example,- 5m30sindicates a duration of five minutes and thirty seconds, or 330 seconds.
- Add - log_monkeys_to_fileconfig option to choose whether to write monkey logs to files or console.
- Add - start_batch_sizeand- start_batch_waitflock config parameters to allow starting monkeys in a slower and more controlled way.
Bug fixes#
- When starting a flock, create user tokens simultaneously (up to the limit of the httpx connection pool size of 100) rather than serially. 
- Fix jitter calculations in Nublado businesses. 
- Notebook repos are only cloned once per process (and once per refresh request), instead of once per monkey. This should speed up how fast NotebookRunner flocks start, especially in load testing usecases. 
Other changes#
- Modify TAPBusiness to use pyvo’s - run_asyncinstead of using- submit_joband polling.
14.0.0 (2025-01-31)#
Backwards-incompatible changes#
- Instrument tracing and exception handling with Sentry. All timings are now calculated with Sentry tracing functionality, and all Slack notifications for errors come from Sentry instead of the Safir - SlackExceptionmachinery.
New features#
- Send an app metrics event for - EmptyLoopbusiness iterations.
- Remove the limit from the autostart aiojobs - Scheduler. Attempts to start a job past the limit resulted in jobs silently never starting. There are no cases where we would want to limit the autostart concurrency, so a limit is not needed.
13.2.0 (2024-12-17)#
New features#
- Publish application metrics. 
13.0.1 (2024-11-19)#
Bug fixes#
- Fix handling of Jupyter XSRF cookies. 
13.0.0 (2024-11-12)#
Backwards-incompatible changes#
- All app config, including autostart config (and excluding secrets, which still come from environment variables) now comes from a single YAML file, provisioned by a single - ConfigMapin Phalanx.
12.0.2 (2024-10-31)#
Bug fixes#
- Improve exception reports from the Nublado client. 
12.0.1 (2024-10-29)#
Bug fixes#
- Fix exceptions in the Nublado notebook runner caused by not having the cell ID. 
12.0.0 (2024-10-28)#
Other changes#
- Replace the internal Nublado client with the new client released to PyPI. 
11.0.0 (2024-08-06)#
Backwards-incompatible changes#
- Remove - exclude_dirsoption from- NotebookRunneroptions, which means it can no longer be set in the autostart config.- exclude_dirsmust be set in an in-repo- mobu.yamlconfig file.
New features#
- NotebookRunnerbusiness will skip notebooks in environments that do not have the services required for them to run. Required services ban be declared by adding metadata to a notebook.
- Allow specification of the log level for individual flocks. 
Bug fixes#
- Inspect individual redirects for JupyterHub logins as well as JupyterLab to get updated XSRF cookies. 
10.1.0 (2024-07-12)#
Other changes#
- Update to the latest Safir release with GitHub model changes. 
10.0.0 (2024-07-11)#
Backwards-incompatible changes#
- GitHub CI and refresh app config are now each a separate, all-or-nothing set of config that comes from a mix of a yaml file and env vars. This requires some new and different Helm values in Phalanx (see https://mobu.lsst.io/operations/github-ci-app.html#add-phalanx-configuration) 
- The GitHub CI app now takes the scopes it assigns from config values, rather than hardcoding a list of scopes. 
9.0.0 (2024-07-09)#
Backwards-incompatible changes#
- The existing refresh functionality is now a GitHub app integration (from a simple webhook integration). This requires new Phalanx secrets to be sync’d, and a new GitHub app to be added to repos that want the functionality. Special care has been taken to not leave these checks in a forever-in-progress state, even in the case of (graceful) mobu shutdown/restart 
New features#
- A GitHub app integration to generate GitHub actions checks for commits pushed to notebook repo branches that are part of active PRs. These checks trigger and report on a solitary Mobu run of the changed notebooks in the commit. 
8.1.0 (2024-05-30)#
New features#
- NotebookRunnerflocks can now pick up changes to their notebooks without having to restart the whole mobu process. This refresh can happen via:- GitHub - pushwebhook post to- /mobu/github/webhookwith changes to a repo and branch that matches the flock config
- monkeyflocker refresh <flock>
- POSTto- /mobu/flocks/{flock}/refresh
 
8.0.0 (2024-05-21)#
Backwards-incompatible changes#
- NotebookRunner business now runs all notebooks in a repo, at tht root and in all subdirs recursively, by default. 
- Add - exclude_dirsoption to NotebookRunner business to list directories in which notebooks will not be run.
7.1.1 (2024-03-28)#
Bug fixes#
- Correctly extract cookies from the middle of the redirect chain caused by initial authentication to a Nublado lab. This fixes failures seen with labs containing JupyterHub 4.1.3. 
7.1.0 (2024-03-21)#
New features#
- Add - GitLFSBusinessfor testing Git LFS by storing and retrieving a Git LFS-managed artifact.
Bug fixes#
- Properly handle the XSRF tokens for JupyterHub and the Jupyter lab by storing separate tokens for the hub and lab after initial login and sending the appropriate XSRF token in the - X-XSRFTokenheader to the relevant APIs. This fixes a redirect loop at the Jupyter lab when running 4.1.0 or later.
Other changes#
- mobu now uses uv to maintain frozen dependencies and set up a development environment. 
7.0.0 (2023-12-15)#
Backwards-incompatible changes#
- Drop support for cachemachine and Nublado v2. The - cachemachine_image_policyand- use_cachemachineconfiguration options are no longer supported and should be deleted.
- Rename the existing - TAPQueryRunnerbusiness to- TAPQuerySetRunnerto more accurately capture what it does. Add a new- TAPQueryRunnerbusiness that runs queries chosen randomly from a list. Based on work by @stvoutsin.
- Rename - JupyterPythonLoopto- NubladoPythonLoopto make it explicit that it requires Nublado and will not work with an arbitrary JupyterHub.
New features#
- Convert all configuration options that took intervals in seconds to - timedelta. Bare numbers will still be interpreted as a number of seconds, but any format Pydantic recognizes as a- timedeltamay now be used.
Other changes#
- All environment variables used to configure mobu now start with - MOBU_, and several have changed their names. The new settings are- MOBU_ALERT_HOOK,- MOBU_AUTOSTART_PATH,- MOBU_ENVIRONMENT_URL,- MOBU_GAFAELFAWR_TOKEN,- MOBU_NAME,- MOBU_PATH_PREFIX,- MOBU_LOGGING_PROFILE, and- MOBU_LOG_LEVEL. This is handled by the Phalanx application, so no configuration changes should be required.
6.1.1 (2023-07-06)#
Bug fixes#
- Rather than dumping the full monkey data when summarizing flocks, which can cause long enough delays that in-progress calls fail due to the huge amount of timing data, extract only the success and failure count from the running business. This should be considerably faster and avoid timeout problems. 
- Improve error reporting by catching exceptions thrown while sending code to the lab WebSocket for execution. 
6.1.0 (2023-05-31)#
New features#
- The timeout when talking to JupyterHub and Jupyter labs can now be configured in the business options (as - jupyter_timeout). The default is now 60s instead of 30s.
Bug fixes#
- When reporting httpx failures to Slack, put the response body into an attachment instead of a block so that it will be collapsed if long. 
- Fix reporting of WebSocket open timeouts to Slack. 
6.0.0 (2023-05-22)#
Backwards-incompatible changes#
- Configuration of whether to use cachemachine and, if so, what image policy to use is now done at the business level instead of globally. This allows the same mobu instance to test both Nublado v2 and Nublado v3. 
New features#
- The maximum allowable size for a WebSocket message from the Jupyter lab is now configurable per business and defaults to 10MB instead of 4MB. 
Bug fixes#
- Revert change in 5.0.0 to number all cells, and go back to counting only code cells for numbering purposes. This matches the way cell numbers are displayed in the Jupyter lab UI. 
- When reporting errors to Slack, mobu 5.0.0 mistakenly started stripping ANSI escape sequences from the code being executed, which should be safe since it comes from local notebooks or configuration, instead of the error output, which is where Jupyter labs like to add formatting. Strip ANSI escape sequences from the error output instead of the code. 
5.1.0 (2023-05-15)#
New features#
- mobu now uses httpx instead of aiohttp for all HTTP requests (including websockets for WebSocket connections and httpx-sse for EventStream connections) and makes use of the Safir framework for parsing and reporting HTTP client exceptions. Alerts for failing web requests will be somewhat different and hopefully clearer. 
- mobu now sends keep-alive pings on the WebSocket connection to the lab, hopefully allowing successful execution of cells that take more than five minutes to run. 
- Nublado-based businesses can now set - debugto true in the image specification to request that debugging be enabled in the spawned Jupyter lab.
- mobu now catches timeouts attempting to open a WebSocket to the lab and reports them to Slack with more details. 
- Slack alerts from monkeys now include the flock and monkey name as a field in the alert. 
- Unexpected business exceptions now include an “Exception type” heading and use “Failed at” instead of “Date” to match the display of expected exceptions. 
- The prefix for mobu routes ( - /mobuby default) can now be configured with- SAFIR_PATH_PREFIX.
- Uncaught exceptions from mobu’s route handlers are now also reported to Slack. 
Bug fixes#
- The code to determine the Docker reference and description of the running Nublado image is now more robust against unexpected output. 
- Node and cell information in Slack error reports for Nublado errors are now formatted as full blocks rather than fields, since they are often too wide to fit nicely in the limited width of a Slack Block Kit field. 
Other changes#
- The default - error_idle_timefor Nublado-based business is back to 60 seconds instead of 10 minutes. The problem the longer timeout was working around should be fixed in the new Nublado lab controller.
- Nublado-based notebooks now request the - JUPYTER_IMAGE_SPECenvironment variable instead of- JUPYTER_IMAGEto get the running image for error reporting purposes. This is now the preferred environment variable and- JUPYTER_IMAGEis deprecated.
- mobu now uses the Ruff linter instead of flake8, isort, and pydocstyle. 
5.0.0 (2023-03-22)#
Backwards-incompatible changes#
- Settings are now handled with Pydantic and undergo much stricter validation. In particular, the Slack web hook URL must now be a valid URL if provided. 
- In order to enable stricter and more useful Pydantic validation of flock specifications, the syntax for creating a flock has changed. - businessis now a dictionary, the- restartoption has been moved under it, the type of business is specified with- type, and the business configuration options have moved under that key as- options. Options that are not applicable to a given business type are now rejected.
- The - jupyter.url_prefixoption is now just- url_prefix, and- juyter.imageis now just- image. The names of the setting under- imagehave changed.
- The - TAPQueryRunneroptions- tap_syncand- tap_query_setare now just- syncand- query_set.
- lab_settle_timeis no longer supported as a configuration option for the businesses that spawn a Nublado lab. It defaulted to 0 and we never set it.
- JupyterJitterLoginLoophas been retired. Instead, set the- jitteroption on- JupyterPythonLoop.
- JupyterLoginLoophas been merged with- JupyterPythonLoop. The only difference in the former is that no lab session was created and no code was run, which seems pointless and not worth the distinction.- JupyterPythonLoopruns a simple addition by default, which should be an improvement over- JupyterLoginLoopin every likely situation.
New features#
- When the production logging profile is used, the messages from monkeys are no longer reported to the main mobu log, only to the individual monkey logs. This should produce considerably less noise in external log aggregators. 
- The notebook being run is now included in all Slack error reports, not just for code execution failures. 
- The API documentation now shows only the relevant options for the type of business when showing how to create a flock. 
- Add support for running a business once and returning its results, via a POST to the new - /runendpoint.
- Add support for the new Nublado lab controller (see SQR-066. 
- The time a business pauses after a failure before it is restarted is now configurable with the - error_idle_timeoption and defaults to 10 minutes (instead of 1 minute) for Nublado businesses, since this is how long JupyterHub will wait for a lab to spawn before giving up.
Bug fixes#
- The - dp0.2- TAPQueryRunnerquery set is now lighter-weight and will consume less memory and CPU to execute, hopefully reducing timeout errors.
- Cell numbering in error reports is now across all cells, not just code cells. 
- TAPQueryRunnerno longer creates a TAP client in its- __init__method, since creating a TAP client makes HTTP requests to the TAP server that can fail and failure would potentially crash mobu. Instead, it creates the TAP client in- startupand handles exceptions properly so that they’re reported to Slack.
- Business failures during - startupare now counted as a failed execution so that a business that fails repeatedly in- startupdoesn’t report 100% success in the flock summary.
- The code run by - JupyterPythonLoopand- NotebookRunnerto get the Kubernetes node on which the lab is running now uses- lsst.rsp.get_nodeinstead of the deprecated- rubin_jupyer_utils.lab.notebook.utils.get_node.
Other changes#
- Slightly improve logging when monkeys are shut down due to errors. 
- mobu’s internals have been extensively refactored following the design in SQR-072 to hopefully make future maintenance easier.