diff --git a/README.md b/README.md
index ad7ed42b7..bd475cae4 100644
--- a/README.md
+++ b/README.md
@@ -2,19 +2,19 @@ English | [中文](README_CN.md)
# SuperSonic (超音数)
-**SuperSonic is an out-of-the-box yet highly extensible framework for building a data chatbot**. SuperSonic provides a chat interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to define logical semantic models (metrics, dimensions, aliases, relationships, etc) on top of physical data models, and no data modification or copying is required. Meanwhile SuperSonic is designed to be plugable, allowing new functionalities to be added through plugins and core components to be integrated into other systems.
+**SuperSonic is an out-of-the-box yet highly extensible framework for building a data chatbot**. SuperSonic provides a chat interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build logical semantic models (definition of metrics/dimensions/entities, along with their meaning, context and relationships) on top of physical data models, and no data modification or copying is required. Meanwhile, SuperSonic is designed to be pluggable, allowing new functionalities to be added through plugins and core components to be integrated with other systems.
-
+
## Motivation
-The emergence of Large Language Models (LLMs) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging deep learning models to convert natural language queries into SQL queries. While some works show promising results, they are not applicable to real-world scenarios.
+The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language queries into SQL queries. While some works show promising results, they are still not applicable to real-world scenarios.
From our perspective, the key to filling the real-world gap lies in two aspects:
-1. Utilize a combination of rule-based and model-based semantic parsers to deal with different scenarios
-2. Introduce a semantic model layer to encapsulate underlying complexity thus simplify the semantic parsers
+1. Utilize a combination of rule-based and model-based semantic parsers to deal with different scenarios.
+2. Introduce a semantic model layer encapsulating the underlying data complexity(joins, formulas, etc) to simplify semantic parsing.
-With these ideas in mind, we developed SuperSonic as a practical reference implementation and used it to power our real-world products. Additionally, to encourage further development of data chatbots, we decided to open source SuperSonic as an extensible framework.
+With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of data chatbot, we decide to open source SuperSonic as an extensible framework.
## Out-of-the-box Features
@@ -26,33 +26,19 @@ With these ideas in mind, we developed SuperSonic as a practical reference imple
## Extensible Components
-SuperSonic is composed of two layers: supersonic-chat and supersonic-semantic. The chat layer is responsible for converting **natural language query** into semantic query (also known as DSL query), whereas the semantic layer is responsible for converting DSL query into **SQL query**. The high-level architecture and main process flow is shown in below diagram:
+The high-level architecture and main process flow is shown in below diagram:
-
+
-### Chat Layer
-
-The chat layer contains four core components:
-
-- **Chat Interface:** accepts user queries and answer results with appropriate visualization charts. It supports input auto-completion as well as multi-turn conversation.
-
-- **Schema Mapper Chain:** identifies references to semantic schema elements in user queries. It matches queries against the knowledage base which is constructed using the schema of semantic models.
-
-- **Semantic Parser Chain:** resolves query mode based on mapped semantic models. It is composed of a group of rule-based and model-based parsers, each of which deals with specific scenarios.
-
-- **Semantic Query:** performs execution according to the results of semantic parsing. The default semantic query would submit DSL to the semantic component, but new types of semantic query can be extended.
-
-### Semantic Layer
-
-The semantic layer contains four core components:
+- **Chat Interface:** accepts natural language queries and answer results with appropriate visualization charts. It supports input auto-completion as well as multi-turn conversation.
- **Modeling Interface:** empowers analytics engineers to visually define and maintain semantic models. The configurations related to access permission and chat conversation can also be set on the UI.
-- **DSL Parser:** converts DSL expression to intermediate structures. To make it easily integratable with analytics applications, SQL (without joins and calculation formulas) is used as the DSL.
+- **Schema Mapper Chain:** identifies references to schema elements(metrics/dimensions/entities/values) in user queries. It matches the query text against a knowledge base constructed from the semantic models.
-- **Query Planner:** builds and optimizes query plans according to various rules.
+- **Semantic Parser Chain:** understands user queries and extract semantic information. It consists of a combination of rule-based and model-based parsers, each of which deals with specific scenarios.
-- **SQL Generator:** generates final SQL expression (with joins and calculation formulas) based on the query plan.
+- **Semantic Query:** performs execution according to extracted semantic information. It generates SQL queries and executes them against physical data models.
## Quick Demo
@@ -72,4 +58,4 @@ Pull the source code and run script "assembly/bin/build-standalone.sh" to build
### Build for Distributed Mode
-Pull the source code and run scripts "assembly/bin/build-chat.sh" and "assembly/bin/build-semantic.sh" separately to build packages.
+Pull the source code and run scripts "assembly/bin/build-chat.sh" and "assembly/bin/build-semantic.sh" separately to build packages.
\ No newline at end of file
diff --git a/README_CN.md b/README_CN.md
index f2b6e2606..e26fd5338 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -1,64 +1,50 @@
# 超音数(SuperSonic)
-**超音数是一个开箱即用且易于扩展的数据问答对话框架**。通过超音数的问答对话界面,用户能够使用自然语言查询数据,系统会选择合适的可视化图表呈现结果。超音数不需要修改或复制数据,只需要在物理数据库之上构建逻辑语义模型(定义指标、维度、别名、相互间关系等),即可开启数据问答体验。与此同时,超音数被设计为可插拔式框架,允许以插件形式来扩展新功能,或者将核心组件与其他系统集成。
+**超音数是一个开箱即用且易于扩展的数据问答对话框架**。通过超音数的问答对话界面,用户能够使用自然语言查询数据,系统会选择合适的可视化图表呈现结果。超音数不需要修改或复制数据,只需要在物理数据模型之上构建逻辑语义模型(指标/维度/实体的定义,以及他们的业务含义、相互间关系等),即可开启数据问答体验。与此同时,超音数被设计为可插拔式的框架,允许以插件形式来扩展新功能,或者将核心组件与其他系统集成。
-
+
## 项目动机
大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们还并不适用于实际场景。
在我们看来,为了在实际场景发挥价值,有两个关键点:
-1. 将基于规则和基于模型的语义解析器相结合,发挥各自优势,以便处理不同的场景
-2. 引入语义模型层来封装数据底层的复杂性,从而简化语义解析器的问题求解空间
+1. 将基于规则和基于模型的语义解析器相结合,发挥各自优势,以便处理不同的场景。
+2. 引入语义模型层来封装数据底层的复杂性(关联、公式等),从而简化语义解析的求解空间。
-为了落地上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们决定将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
+为了验证上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
## 开箱即用的特性
-- 内置图形界面以便业务用户输入数据查询
-- 内置图形界面以便分析工程师管理语义模型
-- 支持文本输入的联想和查询问题的推荐
-- 支持多轮对话,根据语境自动切换上下文
-- 支持三级权限控制:主题域级、列级、行级
+- 内置图形界面以便业务用户输入数据查询。
+- 内置图形界面以便分析工程师管理语义模型。
+- 支持文本输入的联想和查询问题的推荐。
+- 支持多轮对话,根据语境自动切换上下文。
+- 支持三级权限控制:主题域级、列级、行级。
## 易于扩展的组件
-超音数主要分为两层:supersonic-chat and supersonic-semantic。问答层负责将自然语言查询转换为语义查询(也称为DSL查询),而语义层负责将DSL查询转换为SQL查询。超音数的整体架构和主流程如下图所示:
+超音数的整体架构和主流程如下图所示:
-
-
-### 问答层
-
-问答层包含以下4个核心组件:
+
- **问答对话界面(chat interface)**:接受用户查询并选择合适的可视化图表呈现结果,支持输入联想和多轮对话。
-- **模式映射器(schema mapper)**:基于语义模型的schema构建知识库,然后将自然语言查询在知识库中进行匹配,为后续的语义解析提供相关信息。
-
-- **语义解析器链(semantic parser chain)**:识别查询模式并选择最匹配的语义模型,其由一组基于规则或模型的解析器组成,每个解析器可用于应对不同的特定场景。
-
-- **语义查询(semantic query)**: 根据语义解析的结果执行查询,默认的语义查询会将DSL提交给语义组件,但可以扩展新类型的查询。
-
-### 语义层
-
-语义层包含以下4个核心组件:
-
- **语义建模界面(modeling interface)**:使分析工程师能够通过可视化方式定义和维护语义模型,与访问权限和聊天对话相关的配置也可以在用户界面上设置。
-- **DSL解析器(DSL parser)**:将DSL表达式转换为中间结构。为了使其易于与分析应用程序集成,使用SQL(不含join和计算公式)来作为DSL。
+- **模式映射器(schema mapper chain)**:基于语义模型构建知识库,然后将自然语言文本在知识库中进行匹配,为后续的语义解析提供相关信息。
-- **查询计划器(query planner)**:根据各种规则来构建和优化查询计划。
+- **语义解析器(semantic parser chain)**:理解用户查询并抽取语义信息,其由一组基于规则和基于模型的解析器组成,每个解析器可应对不同的特定场景。
-- **SQL生成器(SQL genenrator)**:基于查询计划来生成最终的SQL语句((含join和计算公式))。
+- **语义查询(semantic query)**: 根据语义信息生成物理SQL执行查询。
## 快速体验
超音数自带样例的语义模型和问答对话,只需以下三步即可快速体验:
-- 从[release page](https://github.com/tencentmusic/supersonic/releases)下载预先构建好的发行包
-- 运行 "bin/start-standalone.sh"启动服务
-- 在浏览器访问http://localhost:9080 开启探索
+- 从[release page](https://github.com/tencentmusic/supersonic/releases)下载预先构建好的发行包;
+- 运行 "bin/start-standalone.sh"启动服务;
+- 在浏览器访问http://localhost:9080 开启探索。
## 如何构建
@@ -66,8 +52,8 @@
### Standalone模式构建
-下载源码包,运行脚本"assembly/bin/build-standalone.sh",将所有服务一起编译打包
+下载源码包,运行脚本"assembly/bin/build-standalone.sh",将所有服务一起编译打包。
### Distributed模式构建
-下载源码包,分别运行脚本"assembly/bin/build-chat.sh"、"assembly/bin/build-semantic.sh",为问答层服务和语义层服务编译打包
\ No newline at end of file
+下载源码包,分别运行脚本"assembly/bin/build-chat.sh"、"assembly/bin/build-semantic.sh",为问答层服务和语义层服务编译打包。
\ No newline at end of file
diff --git a/docs/images/supersonic_components.png b/docs/images/supersonic_components.png
index 8d950b3ff..4db266acc 100644
Binary files a/docs/images/supersonic_components.png and b/docs/images/supersonic_components.png differ