dlt This Week: Replace Loads, Secrets, and SQLAlchemy DuckDB

   |   4 minute read   |   Using 738 words

dlt is a Python data loading toolkit, and this activity batch is worth reading if your pipelines use replace loads, dataset relations, or managed secret storage. The headline change is practical: empty replace resources now clear destination tables through package state instead of relying on fake zero row load jobs.

Empty replace loads now clear tables

The most operator visible work is around replace behavior. The maintainers first used the truncate mechanism for replace tables without data, then used empty files to truncate top level tables, and also fixed replace strategy switching. This matters when a resource normally replaces a table chain but sometimes yields no rows. The correct result is not a quiet no op. The old data has to go.

The changes touch dlt/extract/extract.py, dlt/load/utils.py, and tests/load/pipeline/test_replace_disposition.py. The tests now cover resources that yield nothing, yield None, or yield an empty list. Child tables are included, so nested records do not leave stale rows behind after the root resource goes empty.

There is also better visibility in load package data. The docs in docs/website/docs/running-in-production/running.md now show that package info exposes refresh, dropped_tables, and truncated_tables. That is the part to wire into alerts if replace loads are part of your production recovery story.

package = load_info.load_packages[0]
print(package.refresh)
print(package.dropped_tables)
print(package.truncated_tables)

Dataset SQL wrapping got less brittle

The latest commit moved misplaced merge subqueries to post qualify handling. The short version: dataset relations now handle more projection, join, order, limit, and offset shapes without producing SQL that a destination rejects or reads at the wrong level.

The SQL work is centered in dlt/common/libs/sqlglot.py, dlt/dataset/relation.py, and dlt/dataset/_join.py. New helper logic detects pure column projections, moves ORDER BY, LIMIT, and OFFSET from an inner query onto a wrapper when safe, and keeps them inside when the sort key would be lost. That sounds small until you run the same relation SQL across DuckDB, Postgres, TSQL, Trino, Snowflake, and Databricks.

One detail is worth calling out. ORDER BY against an output alias is now bound back to the source expression for plain selects, but it stays on the output column under DISTINCT and GROUP BY. That is the kind of dialect edge case that usually appears only after a real user chains a few relation calls and a destination refuses the generated SQL. The tests added in this area are a good map of the expected behavior.

AWS secrets and DuckDB SQLAlchemy support landed

Configuration got a useful new backend with AWS Secrets Manager provider support. The provider is disabled by default, follows the same vault fragment model as the existing providers, and uses a default dlt/ prefix so secret names can live in a shared AWS account without turning into a naming mess.

For operators, the important settings are plain:

  • enable_aws_secrets
  • secret_name_prefix
  • only_secrets
  • only_toml_fragments
  • list_secrets
[providers]
enable_aws_secrets = true

[providers.aws_secrets]
only_secrets = false
only_toml_fragments = false
list_secrets = true
secret_name_prefix = "dlt/"

The same batch also enabled DuckDB through the SQLAlchemy destination. That does not replace the native DuckDB destination. It gives users an option when they already manage a SQLAlchemy engine, need duckdb-engine, or want connection setup that the native path does not expose.

There was smaller destination documentation work too. The filesystem destination docs now include Tigris through the Tigris filesystem destination update. That is not a runtime change, but it is useful for teams standardizing object storage targets.

Workspace CLI now fails louder

The workspace CLI behavior changed in a way that should reduce accidental command execution. The maintainers dropped dlt to dlthub command proxying in workspace context. In an active dltHub Workspace, dlt init, dlt pipeline, dlt schema, and related commands no longer silently dispatch into dlthub.

Instead, the command prints the replacement and exits. dlt --version still works. That is a better failure mode for scripts, because a workspace context no longer changes what the dlt executable means under your feet.

The docs around hub installation also moved with this change in docs/website/docs/hub/getting-started/installation.md. If your team has internal runbooks that mix dlt and dlthub, this is the commit to read before users start filing “same command, different behavior” bugs.

How to prepare

  • Test replace resources that can emit no rows. Check load_info.load_packages[0].truncated_tables and make sure stale child tables are gone.
  • If you run inside dltHub Workspace, update scripts from dlt pipeline style commands to the matching dlthub local commands.
  • For AWS Secrets Manager, start with TOML fragments. Turn on list_secrets only when the role has secretsmanager:ListSecrets.


denis256 at denis256.dev