* [improvement](llm) data de-identification for few-shots examples. * [improvement](llm) add plugin-call and preset retrieval features. * [fix](llm) remove config variables. * [improvement][feature]upgrade text2dsl module for date related parse. ---------
English | 中文
SuperSonic (超音数)
SuperSonic is an out-of-the-box yet highly extensible framework for building a data chatbot. SuperSonic provides a chat interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build logical semantic models (definition of metrics/dimensions/entities, along with their meaning, context and relationships) on top of physical data models, and no data modification or copying is required. Meanwhile, SuperSonic is designed to be pluggable, allowing new functionalities to be added through plugins and core components to be integrated with other systems.
Motivation
The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language queries into SQL queries. While some works show promising results, they are still not applicable to real-world scenarios.
From our perspective, the key to filling the real-world gap lies in three aspects:
- Complement the LLM-based semantic parser with rule-based semantic parsers to improve efficiency(in terms of latency and cost).
- Augment semantic parsing with schema mappers(as a kind of preprocessor) and semantic correctors(as a kind of postprocessor) to improve accuracy and stability.
- Introduce a semantic layer encapsulating underlying data context(joins, formulas, etc) to reduce complexity.
With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of data chatbot, we decide to open source SuperSonic as an extensible framework.
Out-of-the-box Features
- Built-in CUI(Chat User Interface) for business users to enter data queries
- Built-in GUI(Graphical User Interface) for analytics engineers to build semantic models
- Built-in GUI for system administrators to manage chat plugins and agents
- Support input auto-completion as well as query recommendation
- Support multi-turn conversation and history context management
- Support four-level permission control: domain-level, model-level, column-level and row-level
Extensible Components
The high-level architecture and main process flow is as follows:
-
Schema Mapper: identifies references to schema elements(metrics/dimensions/entities/values) in user queries. It matches the query text against a knowledge base constructed from the semantic models.
-
Semantic Parser: understands user queries and extract semantic information. It consists of a combination of rule-based and model-based parsers, each of which deals with specific scenarios.
-
Semantic Corrector: checks validity of extracted semantic information and performs correction and optimization if needed.
-
Semantic Layer: performs execution according to extracted semantic information. It generates SQL queries and executes them against physical data models.
-
Chat Plugin: extends functionality with third-party tools. Given all configured plugins with function description and sample questions, the LLM is going to select the most suitable one.
Quick Demo
SuperSonic comes with sample semantic models as well as chat conversations that can be used as a starting point. Please follow the steps:
- Download the latest prebuilt binary from the release page
- Run script "bin/start-standalone.sh" to start a standalone server
- Visit http://localhost:9080 in the browser to start exploration
How to Build
SuperSonic can be deployed in two modes: standalone (for a quick demo) and distributed (for production use).
Build for Standalone Mode
Pull the source code and run script "assembly/bin/build-standalone.sh" to build a single packages.
Build for Distributed Mode
Pull the source code and run scripts "assembly/bin/build-chat.sh" and "assembly/bin/build-semantic.sh" separately to build packages.