mirror of
https://github.com/tencentmusic/supersonic.git
synced 2025-12-10 19:38:13 +00:00
[improvement][docs]update README and architecture diagram
This commit is contained in:
32
README.md
32
README.md
@@ -10,35 +10,37 @@ English | [中文](README_CN.md)
|
|||||||
|
|
||||||
The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language queries into SQL queries. While some works show promising results, they are still not applicable to real-world scenarios.
|
The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language queries into SQL queries. While some works show promising results, they are still not applicable to real-world scenarios.
|
||||||
|
|
||||||
From our perspective, the key to filling the real-world gap lies in two aspects:
|
From our perspective, the key to filling the real-world gap lies in three aspects:
|
||||||
1. Utilize a combination of rule-based and model-based semantic parsers to deal with different scenarios.
|
1. Complement the LLM-based semantic parser with rule-based semantic parsers to improve **efficiency**(in terms of latency and cost).
|
||||||
2. Introduce a semantic model layer encapsulating the underlying data complexity(joins, formulas, etc) to simplify semantic parsing.
|
2. Augment semantic parsing with schema mappers(as a kind of preprocessor) and semantic correctors(as a kind of postprocessor) to improve **accuracy** and **stability**.
|
||||||
|
3. Introduce a semantic layer encapsulating underlying data context(joins, formulas, etc) to reduce **complexity**.
|
||||||
|
|
||||||
With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of data chatbot, we decide to open source SuperSonic as an extensible framework.
|
With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of data chatbot, we decide to open source SuperSonic as an extensible framework.
|
||||||
|
|
||||||
## Out-of-the-box Features
|
## Out-of-the-box Features
|
||||||
|
|
||||||
- Built-in graphical interface for business users to enter data queries
|
- Built-in CUI(Chat User Interface) for *business users* to enter data queries
|
||||||
- Built-in graphical interface for analytics engineers to manage semantic models
|
- Built-in GUI(Graphical User Interface) for *analytics engineers* to build semantic models
|
||||||
|
- Built-in GUI for *system administrators* to manage chat plugins and agents
|
||||||
- Support input auto-completion as well as query recommendation
|
- Support input auto-completion as well as query recommendation
|
||||||
- Support multi-turn conversation and history context management
|
- Support multi-turn conversation and history context management
|
||||||
- Support three-level permission control: domain-level, column-level and row-level
|
- Support four-level permission control: domain-level, model-level, column-level and row-level
|
||||||
|
|
||||||
## Extensible Components
|
## Extensible Components
|
||||||
|
|
||||||
The high-level architecture and main process flow is shown in below diagram:
|
The high-level architecture and main process flow is as follows:
|
||||||
|
|
||||||
<img src="./docs/images/supersonic_components.png" height="80%" width="80%" align="center"/>
|
<img src="./docs/images/supersonic_components.png" height="65%" width="65%" align="center"/>
|
||||||
|
|
||||||
- **Chat Interface:** accepts natural language queries and answer results with appropriate visualization charts. It supports input auto-completion as well as multi-turn conversation.
|
- **Schema Mapper:** identifies references to schema elements(metrics/dimensions/entities/values) in user queries. It matches the query text against a knowledge base constructed from the semantic models.
|
||||||
|
|
||||||
- **Modeling Interface:** empowers analytics engineers to visually define and maintain semantic models. The configurations related to access permission and chat conversation can also be set on the UI.
|
- **Semantic Parser:** understands user queries and extract semantic information. It consists of a combination of rule-based and model-based parsers, each of which deals with specific scenarios.
|
||||||
|
|
||||||
- **Schema Mapper Chain:** identifies references to schema elements(metrics/dimensions/entities/values) in user queries. It matches the query text against a knowledge base constructed from the semantic models.
|
- **Semantic Corrector:** checks validity of extracted semantic information and performs correction and optimization if needed.
|
||||||
|
|
||||||
- **Semantic Parser Chain:** understands user queries and extract semantic information. It consists of a combination of rule-based and model-based parsers, each of which deals with specific scenarios.
|
- **Semantic Layer:** performs execution according to extracted semantic information. It generates SQL queries and executes them against physical data models.
|
||||||
|
|
||||||
- **Semantic Query:** performs execution according to extracted semantic information. It generates SQL queries and executes them against physical data models.
|
- **Chat Plugin:** extends functionality with third-party tools. Given all configured plugins with function description and sample questions, the LLM is going to select the most suitable one.
|
||||||
|
|
||||||
## Quick Demo
|
## Quick Demo
|
||||||
|
|
||||||
@@ -46,11 +48,11 @@ SuperSonic comes with sample semantic models as well as chat conversations that
|
|||||||
|
|
||||||
- Download the latest prebuilt binary from the [release page](https://github.com/tencentmusic/supersonic/releases)
|
- Download the latest prebuilt binary from the [release page](https://github.com/tencentmusic/supersonic/releases)
|
||||||
- Run script "bin/start-standalone.sh" to start a standalone server
|
- Run script "bin/start-standalone.sh" to start a standalone server
|
||||||
- Visit http://localhost:9080 in browser to start exploration
|
- Visit http://localhost:9080 in the browser to start exploration
|
||||||
|
|
||||||
## How to Build
|
## How to Build
|
||||||
|
|
||||||
SuperSonic can be deployed in two modes: standalone (intended for quick demo) and distributed (intended for production).
|
SuperSonic can be deployed in two modes: standalone (for a quick demo) and distributed (for production use).
|
||||||
|
|
||||||
### Build for Standalone Mode
|
### Build for Standalone Mode
|
||||||
|
|
||||||
|
|||||||
28
README_CN.md
28
README_CN.md
@@ -8,35 +8,37 @@
|
|||||||
|
|
||||||
大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们还并不适用于实际场景。
|
大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们还并不适用于实际场景。
|
||||||
|
|
||||||
在我们看来,为了在实际场景发挥价值,有两个关键点:
|
在我们看来,为了在实际场景发挥价值,有三个关键点:
|
||||||
1. 将基于规则和基于模型的语义解析器相结合,发挥各自优势,以便处理不同的场景。
|
1. 在基于大模型语义解析器基础上,增加基于规则的解析器,提升语义解析的**效率**。
|
||||||
2. 引入语义模型层来封装数据底层的复杂性(关联、公式等),从而简化语义解析的求解空间。
|
2. 加入模式映射器和语义修正器,来增强语义解析能力,提升语义解析的**准确性**和**稳定性**。
|
||||||
|
3. 引入语义模型层,封装底层数据的上下文(关联、公式等),降低语义解析的**复杂性**。
|
||||||
|
|
||||||
为了验证上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
|
为了验证上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
|
||||||
|
|
||||||
## 开箱即用的特性
|
## 开箱即用的特性
|
||||||
|
|
||||||
- 内置图形界面以便业务用户输入数据查询。
|
- 内置对话界面以便*业务用户*输入数据查询。
|
||||||
- 内置图形界面以便分析工程师管理语义模型。
|
- 内置图形界面以便*分析工程师*构建语义模型。
|
||||||
|
- 内置图形界面以便*系统管理员*管理问答插件和助理。
|
||||||
- 支持文本输入的联想和查询问题的推荐。
|
- 支持文本输入的联想和查询问题的推荐。
|
||||||
- 支持多轮对话,根据语境自动切换上下文。
|
- 支持多轮对话,根据语境自动切换上下文。
|
||||||
- 支持三级权限控制:主题域级、列级、行级。
|
- 支持四级权限控制:主题域级、模型级、列级、行级。
|
||||||
|
|
||||||
## 易于扩展的组件
|
## 易于扩展的组件
|
||||||
|
|
||||||
超音数的整体架构和主流程如下图所示:
|
超音数的整体架构和主流程如下图所示:
|
||||||
|
|
||||||
<img src="./docs/images/supersonic_components.png" height="80%" width="80%" align="center"/>
|
<img src="./docs/images/supersonic_components.png" height="65%" width="65%" align="center"/>
|
||||||
|
|
||||||
- **问答对话界面(chat interface)**:接受用户查询并选择合适的可视化图表呈现结果,支持输入联想和多轮对话。
|
- **模式映射器(Schema Mapper):** 基于语义模型构建知识库,然后将自然语言文本在知识库中进行匹配,为后续的语义解析提供相关信息。
|
||||||
|
|
||||||
- **语义建模界面(modeling interface)**:使分析工程师能够通过可视化方式定义和维护语义模型,与访问权限和聊天对话相关的配置也可以在用户界面上设置。
|
- **语义解析器(Semantic Parser):** 理解用户查询并抽取语义信息,其由一组基于规则和基于模型的解析器组成,每个解析器可应对不同的特定场景。
|
||||||
|
|
||||||
- **模式映射器(schema mapper chain)**:基于语义模型构建知识库,然后将自然语言文本在知识库中进行匹配,为后续的语义解析提供相关信息。
|
- **语义修正器(Semantic Corrector):** 检查语义信息的合法性,对不合法的信息做修正和优化处理。
|
||||||
|
|
||||||
- **语义解析器(semantic parser chain)**:理解用户查询并抽取语义信息,其由一组基于规则和基于模型的解析器组成,每个解析器可应对不同的特定场景。
|
- **语义模型层(Semantic Layer):** 根据语义信息生成物理SQL执行查询。
|
||||||
|
|
||||||
- **语义查询(semantic query)**: 根据语义信息生成物理SQL执行查询。
|
- **问答插件(Chat Plugin):** 通过第三方工具扩展功能。给定所有配置的插件及其功能描述和示例问题,大语言模型将选择最合适的插件。
|
||||||
|
|
||||||
## 快速体验
|
## 快速体验
|
||||||
|
|
||||||
@@ -56,4 +58,4 @@
|
|||||||
|
|
||||||
### Distributed模式构建
|
### Distributed模式构建
|
||||||
|
|
||||||
下载源码包,分别运行脚本"assembly/bin/build-chat.sh"、"assembly/bin/build-semantic.sh",为问答层服务和语义层服务编译打包
|
下载源码包,分别运行脚本"assembly/bin/build-chat.sh"、"assembly/bin/build-semantic.sh",为问答层服务和语义层服务编译打包
|
||||||
|
|||||||
Binary file not shown.
|
Before Width: | Height: | Size: 274 KiB After Width: | Height: | Size: 285 KiB |
Reference in New Issue
Block a user