mirror of
https://github.com/tencentmusic/supersonic.git
synced 2025-12-10 11:07:06 +00:00
[improvement][docs]Update README to improve motivation part
This commit is contained in:
@@ -8,14 +8,14 @@
|
|||||||
|
|
||||||
## Motivation
|
## Motivation
|
||||||
|
|
||||||
The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language into SQL (so called text2sql or nl2sql). While some works show promising results, they are still not applicable to real-world scenarios.
|
The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language into SQL (so called text2sql or nl2sql). While some works show promising results, they are still not applicable to real-world scenarios. The biggest obstacle stems from the
|
||||||
|
|
||||||
From our perspective, the key to filling the real-world gap lies in three aspects:
|
From our perspective, the key to filling the real-world gap lies in three aspects:
|
||||||
1. Introduce a semantic layer encapsulating underlying data context(joins, formulas, etc) to reduce **complexity**.
|
1. Introduce a semantic layer encapsulating underlying data context(joins, formulas, etc) to reduce **complexity**.
|
||||||
2. Augment semantic parsing with schema mappers(as a kind of preprocessor) and semantic correctors(as a kind of postprocessor) to improve **accuracy** and **stability**.
|
2. Augment the LLM with schema mappers(as a kind of preprocessor) and semantic correctors(as a kind of postprocessor) to mitigate **hallucination**.
|
||||||
3. Complement the LLM-based semantic parser with rule-based semantic parsers to improve **efficiency**(in terms of latency and cost).
|
3. Utilize heuristic rules to improve **efficiency**(in terms of latency and cost).
|
||||||
|
|
||||||
With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of data chatbot, we decide to open source SuperSonic as an extensible framework.
|
With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development of ChatBI, we decide to open source SuperSonic as an extensible framework.
|
||||||
|
|
||||||
## Out-of-the-box Features
|
## Out-of-the-box Features
|
||||||
|
|
||||||
|
|||||||
@@ -9,9 +9,9 @@
|
|||||||
大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们还并不适用于实际场景。
|
大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们还并不适用于实际场景。
|
||||||
|
|
||||||
在我们看来,为了在实际场景发挥价值,有三个关键点:
|
在我们看来,为了在实际场景发挥价值,有三个关键点:
|
||||||
1. 引入语义模型层,封装底层数据的上下文(关联、公式等),降低语义解析的**复杂性**。
|
1. 引入语义模型层,封装底层数据的上下文(关联、公式等),降低SQL生成的**复杂度**。
|
||||||
2. 加入模式映射器和语义修正器,来增强语义解析能力,提升语义解析的**准确性**和**稳定性**。
|
2. 通过一前一后的模式映射器和语义修正器,来缓解LLM常见的**幻觉**现象。
|
||||||
3. 在基于大模型语义解析器基础上,增加基于规则的解析器,提升语义解析的**效率**。
|
3. 利用基于规则的解析器,提升语义解析的**效率**。
|
||||||
|
|
||||||
为了验证上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
|
为了验证上述想法,我们开发了超音数项目,并将其应用在实际的内部产品中。与此同时,我们将超音数作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。
|
||||||
|
|
||||||
@@ -30,7 +30,7 @@
|
|||||||
|
|
||||||
<img src="./docs/images/supersonic_components.png" height="65%" width="65%" align="center"/>
|
<img src="./docs/images/supersonic_components.png" height="65%" width="65%" align="center"/>
|
||||||
|
|
||||||
- **知识库(Knowledge Base):** 定期从语义模型中提取相关的模式信息,构建词典和索引,以便后续的模式映射。
|
- **模型知识库(Knowledge Base):** 定期从语义模型中提取相关的模式信息,构建词典和索引,以便后续的模式映射。
|
||||||
|
|
||||||
- **模式映射器(Schema Mapper):** 将自然语言文本在知识库中进行匹配,为后续的语义解析提供相关信息。
|
- **模式映射器(Schema Mapper):** 将自然语言文本在知识库中进行匹配,为后续的语义解析提供相关信息。
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user