[docs]update README: new architecture diagram and demo gif (#15)

This commit is contained in:
Jun Zhang
2023-07-13 14:11:30 +08:00
committed by jerryjzhang
parent c40efebc7a
commit a0869dc7bd
4 changed files with 56 additions and 22 deletions

View File

@@ -2,7 +2,9 @@ English | [中文](README_CN.md)
# SuperSonic (超音数) # SuperSonic (超音数)
**SuperSonic is an out-of-the-box yet highly extensible framework for building a data chatbot**. SuperSonic provides a chat interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build a logical semantic model (definition of metrics, dimensions, relationships, etc) on top of the physical data stores, and no data modification or copying is required. Meanwhile SuperSonic is designed to be plug-and-play, allowing new functionalities to be added through plugins and core components to be integrated into other systems. **SuperSonic is an out-of-the-box yet highly extensible framework for building a data chatbot**. SuperSonic provides a chat interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to define logical semantic models (metrics, dimensions, relationships, etc) on top of physical data models, and no data modification or copying is required. Meanwhile SuperSonic is designed to be plugable, allowing new functionalities to be added through plugins and core components to be integrated into other systems.
<img src="./docs/images/supersonic_demo.gif" align="center"/>
## Motivation ## Motivation
@@ -19,32 +21,47 @@ With these ideas in mind, we developed SuperSonic as a reference implementation
- Built-in graphical interface for business users to enter data queries - Built-in graphical interface for business users to enter data queries
- Built-in graphical interface for analytics engineers to manage semantic models - Built-in graphical interface for analytics engineers to manage semantic models
- Support input auto-completion as well as query recommendation - Support input auto-completion as well as query recommendation
- Support multi-turn conversation and switch context automatically - Support multi-turn conversation and history context management
- Support three-level permission control: domain-level, column-level and row-level - Support three-level permission control: domain-level, column-level and row-level
## Extensible Components ## Extensible Components
SuperSonic contains four core components, each of which can be extended or integrated: SuperSonic is composed of two layers: supersonic-chat and supersonic-semantic. The chat layer is responsible for converting **natural language query** into semantic query (also known as DSL query), whereas the semantic layer is responsible for converting DSL query into **SQL query**. The high-level architecture and main process flow is shown in below diagram:
<img src="./docs/images/supersonic_components.png" height="50%" width="50%" align="center"/> <img src="./docs/images/supersonic_components.png" height="80%" width="80%" align="center"/>
- **Chat interface:** accepts user queries and answer results with approriate visualization charts. It supports input auto-completion as well as multi-turn conversation. ### Chat Layer
- **Schema mapper:** identifies references to schema elements in natural language queries. It matches queries against the knowledage base which is constructed using the schema of semantic models. The chat layer contains four core components:
- **Semantic parser chain:** resolves query mode and choose the most suitable semantic model. It is composed of a group of rule-based and model-based parsers, each of which deals with specific scenarios. - **Chat Interface:** accepts user queries and answer results with appropriate visualization charts. It supports input auto-completion as well as multi-turn conversation.
- **Semantic model layer:** manages semantic models and generate SQL statement given specific semantic model and related semantic items. It encapsulates technical concepts, calculation formulas and entity relationships of the underlying data. - **Schema Mapper Chain:** identifies references to semantic schema elements in user queries. It matches queries against the knowledage base which is constructed using the schema of semantic models.
- **Semantic Parser Chain:** resolves query mode based on mapped semantic models. It is composed of a group of rule-based and model-based parsers, each of which deals with specific scenarios.
- **Semantic Query:** performs execution according to the results of semantic parsing. The default semantic query would submit DSL to the semantic component, but new types of semantic query can be extended.
### Semantic Layer
The semantic layer contains four core components:
- **Modeling Interface:** empowers analytics engineers to visually define and maintain semantic models. The configurations related to access permission and chat conversation can also be set on the UI.
- **DSL Parser:** converts DSL expression to intermediate structures. To make it easily integratable with analytics applications, SQL (without joins and calculation formulas) is used as the DSL.
- **Query Planner:** builds and optimizes query plans according to various rules.
- **SQL Generator:** generates final SQL expression (with joins and calculation formulas) based on the query plan.
## Quick Demo ## Quick Demo
SuperSonic comes with a sample semantic data model as well as sample chat that can be used as a starting point. Please follow the steps: SuperSonic comes with sample semantic models as well as chat conversations that can be used as a starting point. Please follow the steps:
- Download the latest prebuilt binary from the release page - Download the latest prebuilt binary from the [release page](https://github.com/tencentmusic/supersonic/releases)
- Run script "bin/start-all.sh" to start services - Run script "bin/start-standalone.sh" to start a standalone server
- Visit http://localhost:9080 in browser to explore chat interface - Visit http://localhost:9080 in browser to start exploration
- Visit http://localhost:9081 in browser to explore modeling interface
## How to Build ## How to Build
Download the source code and run script "assembly/bin/build-all.sh" to build both front-end webapp and back-end services Pull the source code and run script "assembly/bin/build-standalone.sh" to build packages in the standalone mode.

View File

@@ -2,6 +2,8 @@
**超音数是一个开箱即用且易于扩展的数据问答对话框架**。通过超音数的问答对话界面,用户能够使用自然语言查询数据,系统会选择合适的可视化图表呈现结果。超音数不需要修改或复制数据,只需要在物理数据库之上构建逻辑语义模型(定义指标、维度、相互间关系等),即可开启数据问答体验。与此同时,超音数被设计为可插拔式框架,允许以插件形式来扩展新功能,或者将核心组件与其他系统集成。 **超音数是一个开箱即用且易于扩展的数据问答对话框架**。通过超音数的问答对话界面,用户能够使用自然语言查询数据,系统会选择合适的可视化图表呈现结果。超音数不需要修改或复制数据,只需要在物理数据库之上构建逻辑语义模型(定义指标、维度、相互间关系等),即可开启数据问答体验。与此同时,超音数被设计为可插拔式框架,允许以插件形式来扩展新功能,或者将核心组件与其他系统集成。
<img src="./docs/images/supersonic_demo.gif" align="center"/>
## 项目动机 ## 项目动机
大型语言模型LLMs如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果但它们还并不适用于实际场景。 大型语言模型LLMs如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果但它们还并不适用于实际场景。
@@ -22,9 +24,13 @@
## 易于扩展的组件 ## 易于扩展的组件
超音数包含四个核心组件,每个都易于扩展或被集成 超音数主要分为两层supersonic-chat and supersonic-semantic。问答层负责将自然语言查询转换为语义查询也称为DSL查询而语义层负责将DSL查询转换为SQL查询。超音数的整体架构和主流程如下图所示
<img src="./docs/images/supersonic_components.png" height="50%" width="50%" align="center"/> <img src="./docs/images/supersonic_components.png" height="80%" width="80%" align="center"/>
### 问答层
问答层包含以下4个核心组件
- **问答对话界面(chat interface)**:接受用户查询并选择合适的可视化图表呈现结果,支持输入联想和多轮对话。 - **问答对话界面(chat interface)**:接受用户查询并选择合适的可视化图表呈现结果,支持输入联想和多轮对话。
@@ -32,17 +38,28 @@
- **语义解析器链(semantic parser chain)**:识别查询模式并选择最匹配的语义模型,其由一组基于规则或模型的解析器组成,每个解析器可用于应对不同的特定场景。 - **语义解析器链(semantic parser chain)**:识别查询模式并选择最匹配的语义模型,其由一组基于规则或模型的解析器组成,每个解析器可用于应对不同的特定场景。
- **语义模型层(semantic model layer)**建模阶段负责构建与管理语义模型查询阶段依据给定的语义模型来生成SQL语句。 - **语义查询(semantic query)**: 根据语义解析的结果执行查询默认的语义查询会将DSL提交给语义组件但可以扩展新类型的查询。
### 语义层
语义层包含以下4个核心组件
- **语义建模界面(modeling interface)**:使分析工程师能够通过可视化方式定义和维护语义模型,与访问权限和聊天对话相关的配置也可以在用户界面上设置。
- **DSL解析器(DSL parser)**将DSL表达式转换为中间结构。为了使其易于与分析应用程序集成使用SQL不含join和计算公式来作为DSL。
- **查询计划器(query planner)**:根据各种规则来构建和优化查询计划。
- **SQL生成器(SQL genenrator)**基于查询计划来生成最终的SQL语句含join和计算公式
## 快速体验 ## 快速体验
超音数自带样例的语义模型和问答对话,只需以下三步即可快速体验: 超音数自带样例的语义模型和问答对话,只需以下三步即可快速体验:
- 从release page下载预先构建好的发行包 -[release page](https://github.com/tencentmusic/supersonic/releases)下载预先构建好的发行包
- 运行 "bin/start-all.sh"启动前后端服务 - 运行 "bin/start-standalone.sh"启动服务
- 在浏览器访问http://localhost:9080 开启数据问答探索 - 在浏览器访问http://localhost:9080 开启探索
- 在浏览器访问http://localhost:9081 开启语义建模探索
## 如何构建 ## 如何构建
下载源码包,运行脚本"assembly/bin/build-all.sh",会将前后端一起编译打包 下载源码包,运行脚本"assembly/bin/build-standalone.sh",将所有服务一起编译打包

Binary file not shown.

Before

Width:  |  Height:  |  Size: 362 KiB

After

Width:  |  Height:  |  Size: 358 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 MiB