Blog

  • Classifying Language Models along Autonomy & Trust Levels

    The Problem

    Language models are everywhere now. People praise them, but also complain about responses—unreliable, hallucination, cannot let it work alone, and so on. These systems, capable of understanding and generating human-like text, are often called copilots—a term borrowed from aerospace or car racing. That term indicates their main expected role being support for the pilot.

    But how do we actually classify what these models can do? And more importantly, how much can we trust them?

    A Hybrid Classification Framework

    Drawing inspiration from the SAE levels of driving automation and grounded in human-computer interaction research on trust in automation, we propose a two-dimensional framework for classifying language models:

    1. Operational Autonomy – adapted from SAE Levels (0–5): What can the model do on its own?
    2. Cognitive Trust and Delegation – how much mental effort does the user expend, and how much responsibility is delegated?

    Each level in the chart below reflects both dimensions.

    LevelAutonomy DescriptionTrust/Delegation Role
    0 – Basic SupportPassive tools like spellcheckers; no real autonomyNo Trust: User must fully control and interpret everything
    1 – Assisted GenerationSuggests words or phrases (autocomplete); constant oversight neededSuggestive Aid: User supervises and approves each suggestion
    2 – Semi-Autonomous Text ProductionGenerates coherent content from prompts (emails, outlines); needs close supervisionCo-Creator: User relies in low-stakes tasks but reviews all outputs
    3 – Context-Aware AssistanceCan handle structured tasks (e.g., medical summaries); users remain alertDelegate: User lets go during routine tasks but monitors for failure
    4 – Fully Autonomous Within DomainsWorks independently in narrow contexts (e.g., customer service bot)Advisor: Trusted within scope; user rarely intervenes
    5 – General Language AgentHypothetical general-purpose assistant capable across domains without oversightAgent: Fully trusted to operate independently and responsibly

    Why SAE Levels Make Sense

    While not acting in the physical world, it makes perfect sense to compare language models to autonomous vehicles in terms of their capabilities and limitations. The SAE classification helps clarify expectations, safety considerations, and technological milestones.

    Let’s first briefly revisit what each SAE level entails for automobiles:

    • Level 0 (No Automation): The human driver does everything; no automation features assist with driving beyond basic warnings.
    • Level 1 (Driver Assistance): The vehicle offers assistance with either steering or acceleration/deceleration but requires constant oversight.
    • Level 2 (Partial Automation): The system can manage both steering and acceleration but still requires the human to monitor closely.
    • Level 3 (Conditional Automation): The vehicle handles all aspects of driving under specific conditions; the human must be ready to intervene if necessary.
    • Level 4 (High Automation): The car can operate independently within designated areas or conditions without human input.
    • Level 5 (Full Automation): Complete autonomy in all environments—no human intervention needed.

    Adapting Levels to Language Models

    Level 0: Basic Support

    At this foundational level, language models serve as simple tools—spell checkers or basic chatbots—that provide minimal assistance without any real understanding or autonomy. They do not generate original content on their own but act as aids for humans who make all decisions.

    Example: Elementary grammar correction programs that flag mistakes but don’t suggest nuanced rewrites.

    Level 1: Assisted Generation

    Moving up one step, some language models begin offering suggestions based on partial input. For example, autocomplete functions in email clients that predict next words or phrases fall into this category—they assist but require constant supervision from users who must review outputs before accepting them.

    Example: Gmail’s smart compose feature.

    Level 2: Semi-Autonomous Text Production

    At this stage, models can generate longer stretches of coherent text when given prompts—think about AI tools that draft emails or outline articles—but they still demand continuous oversight. Users need to supervise outputs actively because errors such as factual inaccuracies or inappropriate tone remain common pitfalls.

    Example: ChatGPT generating email drafts or article outlines.

    Level 3: Context-Aware Assistance

    Now we reach an intriguing analogy with conditional automation—where AI systems can handle complex tasks within certain constraints yet require humans to step back temporarily while remaining alert for potential issues. Large language models operating at this level might manage summarization tasks under specific domains (e.g., medical summaries) but could falter outside their trained scope.

    Example: Medical AI assistants that can summarize patient records but require doctor oversight.

    Level 4: Fully Autonomous Within Domains

    Imagine an AI-powered assistant capable of managing conversations entirely within predefined contexts—say customer service bots handling standard inquiries autonomously within specified industries—but unable beyond those limits without retraining or manual intervention.

    Example: Customer service chatbots for specific industries like banking or retail.

    Level 5: Fully Autonomous General Language Understanding

    Envisioning true “full autonomy” for language models means creating systems that understand context deeply across countless topics and produce accurate responses seamlessly everywhere—all without prompting from humans if desired. While such systems remain theoretical today, research aims toward developing general-purpose AI assistants capable not only of conversing fluently across domains but doing so responsibly without oversight.

    Example: Theoretical future AI systems that could operate across all domains without human oversight.

    Current State and Implications

    Now that we have a clear classification framework, let’s examine where we stand today and what this means for practical applications.

    What does this classification tell us about our current standing? Most contemporary large-scale language models sit somewhere around Levels 2 or early-Level 3—they generate impressive content when given prompts yet still struggle with consistency outside narrow contexts and require vigilant supervision by humans who evaluate accuracy critically.

    However, there’s an important limitation to the SAE analogy that we need to address.

    The Trust Dimension

    While the SAE levels offer a useful metaphor for understanding increasing autonomy, they aren’t a perfect fit for language models because:

    • Language models don’t act in the physical world themselves—humans interpret and act on their outputs
    • Risk and impact in NLP are mediated by human cognition and behavior, unlike the immediate physical risks of self-driving cars
    • Autonomy in NLP often deals more with semantic understanding, trustworthiness, context handling, and ethical alignment than sensor-actuator loops

    Therefore, I also propose a mapping of the SAE levels to trust levels taking into account cognitive load and responsibility:

    • Level 0: No trust: tool offers isolated corrections, requires full user oversight (spellcheck)
    • Level 1: Suggestive aid: user must review and approve every suggestion (autocomplete)
    • Level 2: Co-creator: user maintains active oversight, only defers in low-stakes contexts (drafting emails)
    • Level 3: Delegate: user maintains regular oversight with frequent spot checks and validation (10-20% review)
    • Level 4: Advisor: user maintains strategic oversight with periodic reviews (5-10% audit), especially for high-stakes outputs
    • Level 5: Agent: user maintains governance oversight with systematic audits (1-5% review) despite autonomous operation

    Practical Implications

    Classifying language models along SAE-like levels provides practical benefits:

    1. Common vocabulary for developers, researchers, policymakers, and end-users
    2. Realistic expectations about capabilities—the difference between tools assisting writing versus fully automating complex decision-making processes
    3. Regulatory guidance for ensuring safe deployment at each stage
    4. Effort per level is increasing, probably exponentially

    Design Priorities

    It’s vital not simply to categorize these technologies for academic interest but also because such clarity informs design priorities:

    • Should future efforts focus on improving reliability before granting more independence?
    • How do safety concerns evolve as we move up each level?
    • What ethical considerations arise when deploying increasingly autonomous NLP systems?

    Each incremental step toward higher levels demands careful consideration regarding:

    • Transparency: Can users understand when they’re interacting with an assistant versus an agent?
    • Accountability: Who bears responsibility if an AI-generated statement causes harm?

    Conclusion

    Applying SAE-level classifications offers more than just terminology—it provides a roadmap illustrating how far we’ve come and how much further we need to go in developing intelligent language systems capable not only of mimicking human conversation but doing so responsibly across diverse environments.

    Recognizing where current technology resides on this spectrum enables us all—from engineers designing smarter assistants to regulators crafting informed policies—to make conscious choices grounded in realistic assessments rather than hype or fear.

    As artificial intelligence continues its ascent along these levels—from rudimentary support towards full autonomy—the journey will demand ongoing collaboration among technologists, ethicists, policymakers, and ultimately society itself to ensure these powerful tools serve humanity’s best interests every step along the way.

    References

    SAE J3016™. “Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles.” First published: 2014. Most recent version (as of 2024): SAE J3016_202104 (April 2021). 🔗 https://www.sae.org/standards/content/j3016_202104/

    Hoffman, R. R., Johnson, M., Bradshaw, J. M., & Underbrink, A. (2013). “Trust in automation.” IEEE Intelligent Systems, 28(1), 84–88. DOI: 10.1109/MIS.2013.24

  • AI in IT

    AI in IT

    There are very many interesting ressources out there.

    AI Prompting and APIs

    Always Pro / Teams / Enterprise License
    It’s a race, head to head, best in class is changing all the time
    Opinion: Microsoft CoPilot is last in class
    Large enterprises have already built their own company AI platforms supporting switching of LLMs and providing internal APIs

    AI IDE

    IDEs putting AI first are spreading
    It’s totally different from simple prompting, much more powerful

    AI Workflow

    It all started with LangChain, but now graphical AI workflow tools are evolving

    AI Local LLMs

    Don’t trust AI providers that use your data for training
    Either pay for privacy by provider or setting up local LLMs

    AI inside Apps

    The next wave is coming and it’s big
    Apps are integrating AI

  • Executive Summary: Application Lifecycle in EAM

    #architecture #clarity #velocity #direction

    Das Application Lifecycle Management (ALM) in LeanIX ist ein zentraler Bestandteil des Enterprise Architecture Managements (EAM). Es ermöglicht Unternehmen, den gesamten Lebenszyklus ihrer Anwendungen effektiv zu verwalten und zu optimieren. Dieser Prozess umfasst alle Phasen von der Planung und Entwicklung über den Betrieb bis hin zur Ablösung von Applikationen.

    LeanIX bietet als EAM-Tool umfangreiche Funktionen, um Application Owner bei der Verwaltung ihrer Anwendungen zu unterstützen. Es ermöglicht eine ganzheitliche Sicht auf die IT-Landschaft und hilft dabei, Abhängigkeiten, Risiken und Optimierungspotenziale zu identifizieren.

    In diesem Blog werden wir zunächst die Bedeutung des ALM für Application Owner erläutern und anschließend konkrete Verbesserungsvorschläge für die Umsetzung in LeanIX präsentieren. Ziel ist es, die Effizienz und Effektivität des Application Lifecycle Managements zu steigern und somit einen größeren Mehrwert für das Unternehmen zu schaffen.

    Sensibilisierung der Application Owner

    Um Application Owner, die mehrere Applikationen verantworten und der Meinung sind, dass sie LeanIX nicht benötigen, von der Wichtigkeit von EAM im allgemeines und des Tools im besonderen zu überzeugen, können folgende Testfragen mit Fokus auf Architektur, Prozesse und Daten gestellt werden:

    a) Architektur-bezogene Fragen:

    • Wie schnell können Sie herausfinden, welche Ihrer Applikationen von einer geplanten Infrastrukturänderung betroffen wären?
    • Welche Ihrer Applikationen nutzen veraltete Technologien und müssen in naher Zukunft modernisiert werden?

    b) Prozess-bezogene Fragen:

    • Wie würden Sie den Einfluss einer Ihrer Applikationen auf die gesamte Wertschöpfungskette des Unternehmens beschreiben?
    • Bei einem Ausfall einer Ihrer Applikationen: Wie schnell können Sie alle betroffenen Geschäftsprozesse identifizieren?

    c) Daten-bezogene Fragen:

    • Können Sie für jede Ihrer Applikationen die verarbeiteten Datenentitäten und deren Datenflüsse skizzieren?
    • Können Sie ad hoc angeben, welche Ihrer Applikationen personenbezogene Daten verarbeiten und wie diese geschützt werden?

    d) Übergreifende Fragen:

    • Wie schnell können Sie bei einer Audit-Anfrage alle relevanten Informationen zu Ihren Applikationen zusammenstellen?
    • Wie stellen Sie sicher, dass alle Stakeholder stets über den aktuellen Stand und geplante Änderungen Ihrer Applikationen informiert sind?

    Verbesserungsvorschläge für Application Lifecycle Management in LeanIX

    Um Application Owner bei der Pflege ihrer Applikationen in LeanIX zu unterstützen, die Unternehmensarchitektur stärker am Business auszurichten und den Zusammenhang zum Datenmanagement zu nutzen, schlage ich folgende konkrete Aktivitäten als Diskussionsgrundlage vor:

    1. Schulungen und Workshops für Application Owner:
      • Organisieren Sie regelmäßige Schulungen zu LeanIX und Best Practices
      • Führen Sie Workshops durch, die den Zusammenhang zwischen Applikationen, Geschäftsprozessen und Daten verdeutlichen
      • Erstellen Sie praxisnahe Leitfäden und Checklisten für die Pflege von Applikationen in LeanIX in einem leicht zugänglichen Werkzeug wie z. B. Confluence
      • Erstellen Sie LeanIX-Surveys, über die Application Owner relevante Informationen einfach durch Beantwortung zugeschnittener Fragenkataloge vornehmen können
    2. Prozessorientierte Modellierung in LeanIX:
      • Implementieren Sie eine prozessorientierte Sicht in LeanIX
      • Verknüpfen Sie Applikationen mit den unterstützten Geschäftsprozessen
      • Visualisieren Sie den Beitrag jeder Applikation zur Wertschöpfungskette
    3. Integration von Datenmanagement-Aspekten:
      • Erweitern Sie das LeanIX-Metamodell um relevante Datenmanagement-Attribute
      • Verknüpfen Sie Applikationen mit den von ihnen verarbeiteten Datenentitäten
      • Implementieren Sie Datenflussdiagramme, die den Zusammenhang zwischen Applikationen und Daten zeigen
    4. Automatisierung und Integration:
      • Implementieren Sie Schnittstellen zwischen LeanIX und anderen relevanten Tools (z.B. BPM, Data Management Platform)
      • Automatisieren Sie die Aktualisierung von Basis-Informationen in LeanIX
      • Erstellen Sie Dashboards, die den Pflegestatus und die Datenqualität visualisieren
    5. Governance und Anreize:
      • Etablieren Sie klare Verantwortlichkeiten und SLAs für die Pflege von Applikationsinformationen
      • Implementieren Sie ein Belohnungssystem für Application Owner, die ihre Daten aktuell halten
      • Führen Sie regelmäßige Reviews der Applikationslandschaft durch
    6. Daten-Governance Integration:
      • Verknüpfen Sie Daten-Governance-Rollen (z.B. Data Owner, Data Steward) mit den entsprechenden Applikationen in LeanIX
      • Implementieren Sie Attribute für Datenklassifizierung und Datenschutzanforderungen bei Applikationen
      • Erstellen Sie Reports, die Daten-Governance-Aspekte über die gesamte Applikationslandschaft hinweg zeigen
    7. Kontinuierliche Verbesserung:
      • Etablieren Sie einen regelmäßigen Feedback-Prozess mit Application Ownern
      • Analysieren Sie Nutzungsmuster in LeanIX, um Verbesserungspotenziale zu identifizieren
      • Passen Sie das Metamodell und die Prozesse basierend auf dem Feedback kontinuierlich an
  • Using Cursor AI as Architect and Modeler

    Cursor AI for dev-aware Architects and Modelers, produced using DALL-E, 2024-10-03

    Cursor AI with GitHub significantly improves my personal productivity and service portfolio. In-place coding and writing/blogging in one tool, awesome.

    As freelancer I am focusing on enterprise and IT architecture, customizing methods and modeling languages, and implementing integration of various tools like LeanIX, ARIS, MagicDraw, Confluence, Jira, Xray, ALM, and so on.

    This having said, programming can only be part of my job and it easily drains from my overall availability. So, I am really happy about any booster, be it other freelancers or better tooling. Moreover, since SysML v2 ad code-centric modeling approach is slowly entering the stage, I will be able to extend that productive approach even more.

    Cursor AI is so cool already, yet I would appreciate some improvements regarding different access scenarios

    • IDE: Cursor AI on Windows and Linux Desktops (UI, great)
    • Code: store all in GitHub repositories (storage layer)
    • Notes / Knowlege Base: store all as markdown files in one separate repository in GitHub (storage layer)
    • IDE anywhere: VSCode app for Android on Tablet accessing GitHub until Cursor AI becomes available (UI, mobile)
    • Git anywhere: GitHub for Android (read, search, mobile)

    It would also be nice to be able to add your own local LLM to the list improving data privacy even more.

    Google IDX beta also looks quite promising, based on VSCode for Web as well, but is sucking in all of your prompts and data…

  • Bridging Non-Technical and Technical Teams with Custom Modeling Languages: An E-Commerce Case Study

    FromProductToSolution_DallE_20240922

    In today’s fast-paced business environment, the synergy between non-technical (product) and technical teams is more crucial than ever. Yet, these teams often find themselves speaking different languages, leading to misunderstandings and project delays. How can organizations bridge this gap to foster better communication and collaboration?

    One effective approach is the use of custom modeling languages that incorporate concepts understood by both parties. By focusing on shared language elements and tracing ideas from rough concepts to detailed designs, teams can work more cohesively. This article explores how custom modeling languages centered around system objects, structure, and behavior can unite product and technical teams, using an e-commerce system as an example.

    The Power of Custom Modeling Languages

    Custom modeling languages serve as a common platform where both non-technical and technical teams can articulate and visualize system requirements and designs. These languages use intersecting concepts that are familiar to all stakeholders, facilitating clearer communication and reducing the risk of misunderstandings.

    Key Concepts:

    • System Objects: Fundamental elements that represent real-world entities within the system.
    • Structure: How system objects are organized, represented by product and solution blocks.
    • Behavior: How system objects act and interact, depicted through product and solution use cases.

    Intersecting Language Concepts

    System Objects with Structure and Behavior

    At the core of any system are the system objects, which possess both structure and behavior. By defining these objects, teams create a foundation that both sides understand.

    • Structure: Represents the static aspects—how components are organized.
    • Behavior: Represents the dynamic aspects—how components interact over time.

    Structure: Product and Solution Blocks

    • Product Blocks: High-level components that define what the system should do from a business perspective. For example, in an e-commerce system, this could be the “Shopping Cart” or “Product Catalog.”
    • Solution Blocks: Technical components that detail how the system will achieve the product requirements. This includes databases, servers, and application layers.

    Behavior: Product and Solution Use Cases

    • Product Use Cases: Scenarios that describe user interactions with the system, such as “Place an Order” or “Search for a Product.”
    • Solution Use Cases: Technical workflows that support product use cases, like “Process Payment Transaction” or “Update Inventory Database.”

    Product Level Modeling in an E-Commerce System

    At the product level, modeling focuses on capturing the business requirements and user interactions.

    Example: Customer Journey

    1. Browse Products: The customer explores the product catalog.
    2. Add to Cart: The customer selects items to purchase.
    3. Checkout: The customer provides payment and shipping information.
    4. Order Confirmation: The system confirms the order and provides tracking details.

    By mapping out these product use cases, non-technical teams can convey their needs clearly to technical teams.

    Solution Level Modeling

    At the solution level, the modeling becomes more detailed, incorporating components, classes, and methods that technical teams use to build the system.

    Example: Processing an Order

    1. Order Component: Manages order data and interactions.
      • ClassesOrderOrderItemPaymentDetails
      • MethodsvalidateOrder()processPayment()updateInventory()
    2. User Component: Handles user authentication and profiles.
      • ClassesUserAddressAuthentication
      • Methodslogin()logout()updateProfile()

    By aligning these solution blocks with the product blocks, technical teams can ensure they are meeting the business requirements.

    Tracing Ideas from Concept to Design

    The intersecting concepts allow for seamless tracing of requirements from initial ideas to detailed technical designs.

    • From Product Use Cases to Solution Use Cases: Each product scenario is linked to technical workflows.
    • From Product Blocks to Solution Components: Business components are mapped to their technical counterparts.
    • From System Objects to Classes and Methods: Objects defined at the product level are translated into classes and methods in the codebase.

    This traceability ensures that both teams are aligned throughout the project lifecycle.

    Applying the Toolkit Strategies

    To further enhance collaboration, organizations can implement several strategies:

    Bridge Gaps

    Use the shared modeling language to facilitate communication. Regular meetings to discuss models can help both teams stay aligned.

    Discussion with business stakeholders should be done based on the product modeling language while discussion with technical stakeholders should be done based on the solution modeling language. Intersecting discussions may focus on the transition from product to solution modeling to ensure that the business requirements are translated into technical requirements.

    Empathize More

    Encourage team members to understand each other’s perspectives. Non-technical staff can attend technical walkthroughs, while technical staff can participate in business requirement sessions.

    Define Roles

    Clearly outline who is responsible for each part of the modeling and solution process. This clarity prevents overlaps and confusion.

    E. g. a product owner is responsible for the product aspects of the system, while a solution owner is responsible for the solution modeling. That means that the former is also responsible for product use cases and product blocks, while the latter is also responsible for solution use cases and solution blocks.

    Foster Respect

    Acknowledge the expertise each team brings. Celebrate successes jointly to build mutual respect.

    Create Liaisons

    Appoint team members who are fluent in both product and technical aspects. These liaisons can translate and mediate between teams.

    Continuous Learning

    Promote ongoing education. Workshops and cross-training sessions can help team members appreciate the challenges and workflows of their counterparts.

    Conclusion

    Bridging the gap between non-technical and technical teams is not just about better communication — it’s about creating a cohesive environment where ideas flow seamlessly from concept to implementation. Custom modeling languages that use shared concepts like system objects, structure, and behavior can play a pivotal role in this process.

    By applying these principles and fostering a culture of empathy and continuous learning, organizations can enhance collaboration, drive innovation, and adapt more quickly to market demands. The e-commerce example illustrates how these concepts can be practically applied, but the approach is versatile enough to benefit projects across various industries.

    Future Enhancements and Sophistication

    While the toolkit provides a solid foundation for bridging product and technical teams, there’s room for further sophistication.

    At the product level, more detailed requirements modeling could be introduced, incorporating user stories, acceptance criteria, and business rules. Similarly, at the solution level, technical modeling could be expanded to include architectural patterns, data models, and API specifications.

    Underneath the solution level, additional layers of abstraction could be added, such as infrastructure modeling for cloud deployments or performance modeling for optimization.

    Moreover, extending the language concepts to include roles and interactions would provide a richer context for system behavior. Roles could represent different user types or system actors, while interactions could model the complex relationships between system components.

    These enhancements would create an even more comprehensive toolkit, enabling teams to model and communicate increasingly complex systems with greater precision and clarity.

    Final Annotations on how to use AI to Answer Questions

    My personal goal for this article was to answer a call for a LinkedIn advice question (https://www.linkedin.com/advice/0/what-do-you-technical-non-technical-teams-clash-obaif) in no time by involving AI. It turned out not to be that easy this time. Most of the time was spent checking the result and tweaking the prompt half a dozen times to improve it. The reason seems to be in the complexity of the request and the need to adapt to the limitations of the AI by playing with two abstraction levels and also taking into account my personal experience in various projects. The situation should improve with increasing personal knowledge base accessible by the AI.

    Used approach

    • Copy LinkedIn article into Cursor AI (https://www.cursor.com/) as separate markdown file (can be improved avoiding copying every single section)
    • Create a prompt in Cursor AI in yet another file
    • Use chat in Cursor AI (well, half a dozen iterations) and save result in yet another markdown file
    • Manually improve the result with selective changes directly in Cursor AI (select text, Ctrl-K, command).

    A beautiful side effect is that the sections are answered in a connected way and not separated from each other. I hope this can be helpful for you as well. Thank you for reading and forgive me not having spent more time on a perfectly generated image.