Building a Plugin System in DAP

How we built a plugin system that lets a NuGet package contribute triggers, skills, scripting engines, storage, and chat UI to a running .NET host, plus the hard parts we hit: load-context type identity, per-plugin DI isolation, a lifecycle chicken-and-egg, and a 2 GB memory war story.

Building a Plugin System in DAP

A customer asked us to make their agents read a shared mailbox. A week later, another wanted Telegram. Then someone needed audit events shipped to CloudWatch, and a fourth team wanted S3 instead of our default blob store. Each request was reasonable. Each one, if we built it the obvious way, bolted another integration onto the core platform and onto the core's release train. Ship a Telegram fix, re-deploy everything.

So we built a plugin system. The goal we set: a third party drops a NuGet package into a running DAP instance and that package contributes triggers, skills, scripting engines, storage backends, even chat UI, without us recompiling the host. This post walks the parts that turned out to be hard, with the code that solved them.

What a plugin can contribute

DAP doesn't have one plugin interface. It has thirteen contribution points, and a single package can implement any mix of them: ISkill, ITriggerPlugin, IChatMiddleware, IScriptingEngine, ISkillProvider, IEmbeddingEngineProvider, IBlobStorageProvider, ILogSinkPlugin, IAuditSinkPlugin, IPluginEndpoint, IPluginHealthCheck, and IDAPEventHandler<T>, plus a descriptor and a bootstrapper that every plugin ships.

The contract a trigger author touches is small, and most of it is optional:

public interface ITriggerPlugin : IDisposable
{
    string TriggerType { get; }

    string DisplayName => TriggerType;
    string? Description => null;
    IReadOnlyList<PluginConfigField> CredentialSchema => [];
    IReadOnlyList<PluginConfigField> ConfigSchema => [];

    Task StartAsync(ITriggerContext ctx, CancellationToken ct);
    Task StopAsync(CancellationToken ct);

    Task OnCredentialChangedAsync(Guid credentialId, bool deleted, CancellationToken ct) =>
        Task.CompletedTask;
    Task OnConfigChangedAsync(Guid triggerConfigId, bool deleted, CancellationToken ct) =>
        Task.CompletedTask;

    IReadOnlyList<TriggerDisplayField> DisplayFields => [];
    string? DisplayIcon => null;
}

Almost every member has a default. A plugin that fires agents from a single source needs TriggerType, StartAsync, and StopAsync. The rest, credential schemas, config schemas, display fields, credential-change hooks, exist for plugins that grow into them. This default-interface-method choice is the thing that let us keep adding to the contract. When we added DisplayFields for chat rendering months after the first plugins shipped, every existing plugin kept compiling because the default is an empty list. No coordinated upgrade, no breaking change.

The descriptor follows the same minimalist line. Here's the bundled sample plugin's entire identity:

public sealed class SamplePluginDescriptor : PluginDescriptorBase
{
    public override IReadOnlyList<PluginConfigField> ConfigSchema =>
    [
        new PluginConfigField(
            Key: "ApiKey",
            Label: "API Key",
            Description: "Secret key used to call the external service.",
            FieldType: PluginConfigFieldType.Password,
            IsRequired: true,
            DefaultValue: null,
            SelectOptions: null),
        // ... BaseUrl, LogLevel
    ];
}

The author overrides one property. PluginId, DisplayName, and Version come from PluginDescriptorBase, which reads them out of assembly metadata stamped by our MSBuild targets:

protected PluginDescriptorBase()
{
    var assembly = GetType().Assembly;
    PluginId = GetMetadata(assembly, "DAP.PluginId") ?? assembly.GetName().Name ?? "unknown";
    DisplayName = GetMetadata(assembly, "DAP.DisplayName") ?? PluginId;
    Version = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
        ?.InformationalVersion?.Split('+')[0]      // strip the git hash suffix
        ?? assembly.GetName().Version?.ToString(3) ?? "0.0.0";
}

We learned to push every value we could into one source of truth. A plugin author who has to keep a PluginId string in sync between the .csproj and three C# files will eventually let them drift.

Loading, and the type-identity trap

We load plugins with McMaster.NETCore.Plugins, which wraps AssemblyLoadContext so each plugin lives in its own load context and can be unloaded:

var loader = PluginLoader.CreateFromAssemblyFile(
    mainDll,
    SharedTypes,
    config =>
    {
        config.IsUnloadable = true;
        config.LoadInMemory = false;
    });

var assembly = loader.LoadDefaultAssembly();

That SharedTypes argument is the part that cost us the most debugging. A separate load context means the runtime keeps its own copy of every type the plugin references, including ours. A plugin's ITriggerPlugin is a different type from the host's ITriggerPlugin unless we tell the loader they're the same. When they aren't shared, the symptom is maddening: the plugin compiles, loads, and then instance is ITriggerPlugin returns false for an object that visibly implements it. No exception, just a cast that quietly fails at a boundary you forgot was there.

The fix is to hand the loader the exact list of types that cross the boundary:

private static readonly Type[] SharedTypes =
[
    typeof(ISkill),
    typeof(SkillContext),
    typeof(ITriggerPlugin),
    typeof(ITriggerContext),
    typeof(IPluginBootstrapper),
    typeof(IPluginDescriptor),
    typeof(IPluginConfigProvider),
    typeof(IPluginDataStore),
    typeof(IDAPEventHandler<>),
    typeof(IScriptingEngine),
    // ... ~40 entries
];

This list is the real public API of the plugin system. Not the interfaces, the shared types. Every contract a plugin and host pass between them lives here, and adding a contribution point means adding its types to this array. We treat it like an ABI. If you take one thing from this post: decide your shared-types boundary before you write your first plugin, because every cast across it depends on getting this exactly right.

Finding the right DLL inside a NuGet package is its own small problem. A package ships a lib/ folder with one subdirectory per target framework, and often bundles its own dependencies next to the plugin assembly. We resolve the framework with NuGet's own reducer and then pick the assembly that matches the package ID:

var hostTfm = NuGetFramework.ParseFolder("net9.0");
var reducer = new FrameworkReducer();
var nearest = reducer.GetNearest(hostTfm, frameworks.Select(x => x.Tfm));
// ...
var exactMatch = dlls.FirstOrDefault(f =>
    Path.GetFileNameWithoutExtension(f)
        .Equals(packageIdHint, StringComparison.OrdinalIgnoreCase));

When the name doesn't match, we fall back to excluding the assemblies we know are shared infrastructure or common transitive dependencies (Microsoft.*, System.*, MailKit, Npgsql, and so on). It's heuristic, and we'd rather it were declarative, but it has held up across every plugin we've shipped.

One DI container per plugin

A plugin needs services. The Telegram plugin wants a single TelegramBotClient shared across its trigger and its session store. We let plugins register their own internals through a bootstrapper:

public sealed class TelegramPluginBootstrapper : IPluginBootstrapper
{
    public void Register(IServiceCollection services)
    {
        services.AddSingleton<TelegramBotClient>();
        services.AddSingleton<TelegramSessionStore>();
    }
}

We could have merged those registrations into the host's container. We didn't, for two reasons. A plugin's singletons should die when the plugin unloads, and a plugin shouldn't be able to resolve, or collide with, another plugin's services. So each plugin gets a child container, and a composite provider that checks the child first and falls back to the host for shared infrastructure:

internal sealed class PluginServiceProvider(
    ServiceProvider childProvider,
    IServiceProvider hostProvider) : IServiceProvider, IDisposable
{
    public object? GetService(Type serviceType)
    {
        var service = childProvider.GetService(serviceType);
        if (service is not null)
            return service;
        return hostProvider.GetService(serviceType);
    }

    public void Dispose() => childProvider.Dispose();
}

Child-first, host-fallback. The plugin sees its own TelegramBotClient and the host's ILoggerFactory, and on unload we dispose the child container without touching the host's. We don't duplicate the host's entire DI graph per plugin, which would be both slow and a memory problem at the scale we run.

The child container is also where per-plugin isolation gets enforced. When we build it, we inject a config provider and a data store that are bound to this plugin's identity:

var scopedConfig = new ScopedPluginConfigProvider(
    hostProvider.GetRequiredService<PluginConfigProviderBacking>(), plugin.PackageId);
var scopedData = new ScopedPluginDataStore(
    hostProvider.GetRequiredService<PluginDataStoreBacking>(), plugin.PackageId);
childServices.AddSingleton<IPluginConfigProvider>(scopedConfig);
childServices.AddSingleton<IPluginDataStore>(scopedData);

A plugin calls dataStore.SetAsync("last-seen-uid", "42") and the host scopes that key under the plugin's package ID before it hits the database. The plugin never supplies its own identity, so it can't read or write another plugin's data even if it tries. The contract's XML doc says it plainly: "The host binds the plugin identity at registration time; plugins never supply their own ID." That sentence is a security boundary, not a convenience.

Two ways in: startup and hot-install

Plugins arrive on two paths, and they need different registration code. Bundled plugins are discovered on a disk scan before the host's container is built, so their contributions go straight into the IServiceCollection:

foreach (var trigger in contributions.Triggers)
{
    services.AddSingleton<ITriggerPlugin>(trigger);
    plugin.ContributedTriggerTypes.Add(trigger.TriggerType);
}

A plugin installed from the admin UI while the host is running can't touch the sealed container. Those contributions go into mutable registries that the rest of the system reads through:

foreach (var trigger in contributions.Triggers)
{
    registries.Triggers.Register(trigger);
    plugin.ContributedTriggerTypes.Add(trigger.TriggerType);
}

The same scan-and-instantiate logic feeds both paths, so a plugin behaves identically whether it shipped in the box or got installed at 3pm on a Tuesday.

Not everything can hot-install, and we decided to be loud about that rather than fake it. Some contribution types are wired so deep into the container, the LLM client, the MCP transport, the embedding client, the document text extractor, that swapping them at runtime isn't safe. The installer detects them and refuses to pretend:

if (typeof(ILlmClient).IsAssignableFrom(type))
    restartRequired.Add((typeof(ILlmClient), type));
if (typeof(IMcpTransport).IsAssignableFrom(type))
    restartRequired.Add((typeof(IMcpTransport), type));
// ...later, on the runtime path:
logger.LogWarning("Plugin {Pkg}: contains {Interface} {Type} — requires app restart to activate",
    plugin.PackageId, iface.Name, concreteType.Name);

An operator who installs an LLM-provider plugin gets a clear "restart to activate" message instead of a feature that silently does nothing. Honest limits beat magic that fails quietly.

War story: the engine that idled at 2 GB

Scripting engines are a contribution point like any other. IScriptingEngine sits in the shared-types list, and our built-in Roslyn engine implements it the same way a third party would. It compiles a user's C# skill body into a runnable script and caches the result. For months it worked and nobody looked at it. Then we checked the host's memory at idle and found it holding 2.2 GB before serving a single request.

The cache was the culprit, and the bug was in its key. We cached compiled scripts by (scriptId, version):

private readonly ConcurrentDictionary<(Guid, int), Script<SkillResult>> _cache = new();

That was reasonable until you remember DAP is multi-tenant and we seed the same system and plugin skills into every tenant. The same skill body got compiled once per tenant, because each tenant's copy carried a different scriptId. Every entry pinned its own CSharpCompilation, and a CSharpCompilation holds a full symbol model over every referenced assembly, roughly 20 MB apiece. Multiply 20 MB by every skill across every tenant and the idle number stops being a mystery.

The fix was to key the cache by what actually determines the compiled output, not by who owns the row. We hash the body, the input schema, and the credential aliases, and let identical content share one compilation:

private readonly ConcurrentDictionary<string, Script<SkillResult>> _cache = new();
private readonly ConcurrentDictionary<(Guid, int), string> _identityToHash = new();

private static string ComputeContentHash(SkillScript script, IReadOnlyList<string>? credAliases)
{
    const string Sep = "";   // delimiter a script body cannot contain
    var sb = new StringBuilder();
    sb.Append(script.Body ?? string.Empty).Append(Sep);
    sb.Append(script.InputSchemaJson ?? string.Empty).Append(Sep);
    if (credAliases is not null)
        foreach (var a in credAliases.OrderBy(static x => x, StringComparer.Ordinal))
            sb.Append(a).Append(Sep);
    var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(sb.ToString()));
    return Convert.ToHexString(bytes);
}

_identityToHash keeps callers addressing the cache by (scriptId, version) as before, while the heavy compilation lives once under its content hash. The same skill across forty tenants now compiles a single time.

Sharing a compilation means we can't evict it the moment one tenant's row disappears. So eviction became refcount-aware: drop the identity's mapping, and discard the compilation only when no other identity still points at it.

public void Evict(Guid scriptId, int version)
{
    if (_identityToHash.TryRemove((scriptId, version), out var hash)
        && !_identityToHash.Values.Contains(hash))
    {
        _cache.TryRemove(hash, out _);
    }
}

Dedup got us most of the way. The next slice came from compilations we should never have built. System skills are seeded as Roslyn rows, but their real logic runs in compiled ISkill classes; the stored body is a placeholder comment that never executes as a script. Warm-up was compiling roughly twenty of them anyway, each pinning its own 20 MB model for a script no one would ever run. We skipped them:

scripts = scripts.Where(s => !s.IsSystem).ToList();

One fix we tried made things worse, and that's the part worth remembering. We assumed the symbol model was heavy because each compilation referenced too many assemblies, so we added a denylist to trim the metadata references. RAM went up. Roslyn shares metadata across compilations that reference the same assembly objects, and our trimming broke that sharing and raised the per-compilation cost. We reverted it, kept the full reference set, and added Count and TotalBytes diagnostics so the next person profiles before they prune.

A smaller bug hid in the new caching path, and it's easy to write. Our first version mapped the identity to its content hash before the compile finished. When a compile failed, the identity was left pointing at a cache entry that never got created, and the next execute threw KeyNotFoundException instead of recompiling. We moved the mapping to after a successful compile, so a failure leaves the identity unmapped and the next call retries cleanly:

_cache[contentHash] = compiled;
_identityToHash[identity] = contentHash;   // only reached on a successful compile

Idle dropped from 2.2 GB to about 450 MB. The lesson outlives Roslyn: when a contribution caches an expensive artifact in a multi-tenant host, key the cache by the artifact's content, not by the tenant row that asked for it, or you pay for every duplicate.

Firing agents without a database

A trigger's whole job is to run agents when something happens outside DAP. The naive design hands the plugin a repository and a ChatHandler and lets it do the work. We didn't want a third party holding a database connection or reaching into the chat pipeline, so we gave triggers exactly one capability through a context object:

public interface ITriggerContext
{
    Task FireAgentsAsync(
        string triggerType,
        string message,
        string? payloadJson = null,
        List<Guid>? attachmentIds = null,
        TriggerInvocationFacts? displayFacts = null,
        CancellationToken ct = default);

    Task<IReadOnlyList<TriggerConfigSnapshot>> GetActiveConfigsAsync(
        string triggerType, CancellationToken ct = default);

    Task<Guid> StoreFileAsync(int tenantId, string fileName,
        string contentType, byte[] data, CancellationToken ct = default);
    // RunAgentForConfigAsync, ResumeAgentAsync for chat transports
}

The implementation does nothing more than publish a message onto our Wolverine bus and let the host decide which agents run:

public async Task FireAgentsAsync(string triggerType, string message, /* ... */)
{
    await using var scope = scopeFactory.CreateAsyncScope();
    var bus = scope.ServiceProvider.GetRequiredService<IMessageBus>();
    await bus.PublishAsync(new FirePluginTriggerEvent(
        triggerType, message, payloadJson, null, attachmentIds, displayFacts?.Values));
}

The plugin says "a Telegram message arrived, here's the text." The host owns everything after that: which agents subscribe to that trigger type, which tenant they belong to, how the run is scheduled. A plugin can't accidentally bypass our scheduling or leak across tenants because it was never handed the tools to.

GetActiveConfigsAsync shows the other side of that wall. A trigger supervisor runs in the background and has to see every tenant's configs to know what to poll, but our row-level security would normally hide them. So the host, not the plugin, briefly elevates:

using var _ = tenantContext.PushSuperAdmin();
await using var scope = scopeFactory.CreateAsyncScope();
var configRepo = scope.ServiceProvider.GetRequiredService<IAgentTriggerConfigRepository>();
// fetch configs + resolve each config's credentials

The elevation lives in host code the plugin can't reach. The plugin receives a TriggerConfigSnapshot with the credentials it needs and nothing more.

The lifecycle chicken and egg

Here's the bug that cost us an afternoon. Plugins can implement IDAPEventHandler<T>, and our installer pre-instantiates those handlers during the disk scan so they're ready the moment the host starts publishing events. Some of those handlers want an ITriggerContext injected. But the real TriggerContext depends on the main container's IServiceScopeFactory, and during the disk scan the main container doesn't exist yet. The handler needs a service that can't be built until after the handler is built.

We broke the cycle with a deferred adapter. It implements ITriggerContext, gets registered in the bootstrap container so handlers can take it as a constructor dependency, and forwards to the real implementation once the host binds it:

public sealed class DeferredTriggerContext : ITriggerContext
{
    private ITriggerContext? _inner;

    public void Bind(ITriggerContext inner)
    {
        ArgumentNullException.ThrowIfNull(inner);
        if (_inner is not null && !ReferenceEquals(_inner, inner))
            throw new InvalidOperationException(
                "DeferredTriggerContext is already bound to a different instance.");
        _inner = inner;
    }

    private ITriggerContext Inner => _inner
        ?? throw new InvalidOperationException(
            "ITriggerContext was invoked before the host container finished initializing. " +
            "Plugin event handlers must not call ITriggerContext during construction.");

    public Task FireAgentsAsync(string triggerType, string message, /* ... */)
        => Inner.FireAgentsAsync(triggerType, message, /* ... */);
    // every other member forwards to Inner the same way
}

The value is in the error message. A plugin author who fires a trigger from a constructor, before the host has finished starting, gets a sentence that names the mistake and the rule. Without the guard, the same mistake surfaces as a null reference three frames deep in our forwarding code, and the author has no idea the problem is when they called, not what they called. We spend a guard clause to buy a debuggable failure.

Config forms the plugin never has to draw

A plugin needs configuration, and we were not going to ask plugin authors to write React. So a plugin declares its fields, and the host renders the form. The field type is rich enough to drive real UI:

public sealed record PluginConfigField(
    string Key,
    string Label,
    string? Description,
    PluginConfigFieldType FieldType,
    bool IsRequired,
    string? DefaultValue,
    IReadOnlyList<string>? SelectOptions,
    FieldVisibility? VisibleWhen = null);

public enum PluginConfigFieldType { Text, Password, Number, Bool, Select, OAuthButton }

VisibleWhen was the field that made this feel less like a config dump and more like a form. The IMAP plugin authenticates with a password or with Microsoft OAuth2, and it would be noise to show password and OAuth fields at the same time. So its credential schema hides the OAuth alias field unless the auth method is set to oauth2:

new PluginConfigField(
    Key: "OAuthCredentialAlias", Label: "OAuth Credential", /* ... */,
    VisibleWhen: new FieldVisibility("AuthMethod", ["oauth2"])),

The frontend reads the same schema as a typed structure and builds the form, conditional visibility and all. The backend describes intent; the frontend owns rendering. A plugin in a NuGet package changes its config form by changing a record, and the UI follows with no frontend deploy.

Config that holds secrets needs one more thing: a way to tell a plugin its values changed without handing the plugin's neighbors those values. Our config provider exposes a deliberately value-free signal:

public interface IPluginConfigProvider
{
    string? GetValue(string key);
    Task<string?> GetValueAsync(string key, CancellationToken ct = default);

    /// <summary>Raised when this plugin's stored configuration changes. Re-read
    /// what you need through GetValue when it fires.</summary>
    event Action? Changed;
}

Changed carries no payload and fires only for the plugin it belongs to. A stateful plugin that built a client from its config subscribes, drops the cached client, and rebuilds it on next use from fresh values. A plugin that just calls GetValue on demand ignores the event entirely, because every read already sees the latest value. The signal never crosses plugin boundaries and never carries a secret over our event bus.

Letting a backend package style the chat

The hardest "no frontend code" problem was chat rendering. A Telegram message should show up in our chat view as a rich bubble with the sender's name and avatar, but the plugin that knows those facts is a backend NuGet package. We didn't want plugins shipping UI, and we didn't want the host hardcoding knowledge of every plugin's message shape.

We landed on a declarative display schema. A plugin describes the fields its invocations have and how each should render, and the frontend resolves that description against a facts dictionary at render time:

export type TriggerDisplaySurface = 'bubble' | 'summary' | 'both';

export interface TriggerDisplayField {
  kind: 'text' | 'code' | 'link' | 'chip' | 'avatar' | 'timestamp' | 'attachment' | 'summary';
  key: string;
  label: string;
  surface: TriggerDisplaySurface;
  urlKey?: string | null;
  nameKey?: string | null;
  emailKey?: string | null;
  format?: string | null;
  // ...
}

When the plugin fires an agent, it passes the actual values alongside the message:

public sealed record TriggerInvocationFacts(IReadOnlyDictionary<string, string> Values);

The plugin's DisplayFields say "render an avatar from senderName, a chip from channel, a timestamp from receivedAt." The TriggerInvocationFacts carry senderName = "Alice", channel = "support", and so on. The frontend matches keys to the schema and draws the bubble. A new plugin gets rich chat rendering by returning a list of field descriptors, and our chat view never learns its name. Plugins that predate the feature return an empty list and keep rendering as plain text, the same default-interface trick that let us add it without a migration.

Testing a plugin without a host

A plugin author shouldn't need a Postgres instance and a message bus to test a trigger. The SDK ships a fake context that records every fire call for assertion:

public sealed class FakeTriggerContext : ITriggerContext
{
    public List<FiredTrigger> Fires { get; } = [];
    public List<TriggerConfigSnapshot> ActiveConfigs { get; set; } = [];

    public Task FireAgentsAsync(string triggerType, string message, /* ... */)
    {
        Fires.Add(new FiredTrigger(triggerType, message, /* ... */));
        return Task.CompletedTask;
    }
    // ...
}

A fluent builder wires the fakes into a container so a test resolves the plugin the way the host would:

var host = new PluginTestHostBuilder()
    .WithConfig("ApiKey", "test-key")
    .WithTriggerConfig(snapshot)
    .WithBootstrapper<TelegramPluginBootstrapper>()
    .Build();

var trigger = host.Create<MyTriggerPlugin>();
await trigger.StartAsync(host.TriggerContext, CancellationToken.None);

Assert.Single(host.TriggerContext.Fires);

The author asserts against recorded fire calls and stored files, with no database and no bus. We shipped this in the same SDK package as the contracts, because a plugin contract you can't easily test is a contract people will get wrong.

What we'd tell the next team

A handful of things held up across every plugin we built. Decide the shared-types boundary first, because it's the ABI and every cast across the load-context boundary depends on it. Lean on default interface methods, since they're how you grow a plugin contract without a coordinated upgrade. Be explicit about what can't hot-reload instead of faking it, because an operator forgives a restart prompt and never forgives a feature that silently does nothing. Push isolation into DI scoping rather than convention, so a plugin's data and config are walled off by construction instead of by everyone remembering to be careful. And when a contribution caches something expensive, key it by content rather than by the tenant that asked for it, or a multi-tenant host quietly pays for every duplicate.

The payoff shows up on release day. When a customer needs a new integration now, we write a plugin, publish a package, and they install it against the running platform. The core stays still, and the edges move as fast as the people building them.

Building a Plugin System in DAP