diff --git a/README.md b/README.md index 57d0d6048..896d114bc 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ # SuperSonic (超音数) -SuperSonic is the next-generation BI platform that integrates **Chat BI** (powered by LLM) and **Headless BI** (powered by semantic layer). Both paradigms benefit from the integration: +SuperSonic is the next-generation BI platform that integrates **Chat BI** (powered by LLM) and **Headless BI** (powered by semantic layer). This integration ensures that Chat BI has access to the same curated and governed semantic data models as traditional BI. Furthermore, the implementation of both paradigms benefits from the integration: - Chat BI's Text2SQL capability gets enhanced with semantic data models. - Headless BI's query interface gets augmented with natural language support. @@ -17,12 +17,14 @@ SuperSonic provides a chat interface that empowers users to query data using nat ## Motivation -The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved. In the field of data analytics, both academia and industry are primarily focused on leveraging LLM to convert natural language into SQL (so called Text2SQL or NL2SQL). While some approaches exhibit promising results, their **reliability** and **efficiency** are insufficient for real-world applications. +The emergence of Large Language Model (LLM) like ChatGPT is reshaping the way information is retrieved, leading to a new paradigm in the field of data analytics known as Chat BI. To implement Chat BI, both academia and industry are primarily focused on harnessing the power of LLMs to convert natural language into SQL, commonly referred to as Text2SQL or NL2SQL. While some approaches show promising results, their **reliability** falls short for large-scale real-world applications. + +Meanwhile, another emerging paradigm called Headless BI, which focuses on constructing unified semantic data models, has garnered significant attention. Headless BI is implemented through a universal semantic layer that exposes consistent data semantics via an open API. + +From our perspective, the integration of Chat BI and Headless BI has the potential to enhance the Text2SQL capability in two dimensions: -From our perspective, the key to filling the real-world gap lies in three aspects: 1. Incorporate data semantics (such as business terms, column values, etc.) into the prompt, enabling LLM to better understand the semantics and **reduce hallucination**. 2. Offload the generation of advanced SQL syntax (such as join, formula, etc.) from LLM to the semantic layer to **reduce complexity**. -3. Utilize rule-based semantic parsers when necessary to **improve efficiency**(in terms of latency and cost). With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development we decide to open source SuperSonic as an extensible framework. @@ -30,6 +32,7 @@ With these ideas in mind, we develop SuperSonic as a practical reference impleme - Built-in Chat BI interface for *business users* to enter natural language queries - Built-in Headless BI interface for *analytics engineers* to build semantic data models +- Built-in rule-based semantic parser to improve efficiency in certain scenarios - Support input auto-completion as well as query recommendation - Support four-level permission control: domain-level, model-level, column-level and row-level diff --git a/README_CN.md b/README_CN.md index e194fb262..68248516a 100644 --- a/README_CN.md +++ b/README_CN.md @@ -1,6 +1,6 @@ # SuperSonic (超音数) -**SuperSonic融合Chat BI(powered by LLM)和Headless BI(powered by 语义层)打造新一代的BI平台**。两种BI新范式都从融合中获得收益: +**SuperSonic融合Chat BI(powered by LLM)和Headless BI(powered by 语义层)打造新一代的BI平台**。这种融合确保了Chat BI能够与传统BI一样访问统一化治理的语义数据模型。此外,两种BI新范式都从中获得收益: - Chat BI的Text2SQL能力通过语义数据模型得到增强。 - Headless BI的查询接口通过支持自然语言得到拓展。 @@ -13,12 +13,14 @@ ## 项目动机 -大型语言模型(LLMs)如ChatGPT的出现正在重塑信息检索的方式。在数据分析领域,学术界和工业界主要关注利用深度学习模型将自然语言查询转换为SQL查询。虽然一些工作显示出有前景的结果,但它们的可靠性还达不到生产可用的要求。 +大型语言模型(LLM)如ChatGPT的出现正在重塑信息检索的方式,引领数据分析领域的一种新范式,被称为Chat BI。为了实现Chat BI,学术界和工业界主要关注利用LLM的能力将自然语言转换为SQL,通常称为Text2SQL或NL2SQL。尽管一些方法显示出有希望的结果,但它们在大规模实际应用中的可靠性还不足。 -在我们看来,为了在实际场景发挥价值,有三个关键点: -1. 通过在提示词中增加数据语义(如业务术语、列取值等)使LLM对语义有更好的理解,以减少**幻觉**。 -2. 将高级SQL语法(如连接、公式等)的生成从LLM卸载到语义层,以降低**复杂性**。 -3. 在某些特定场景使用基于启发式规则的语义解析器,以提升**效率**。 +与此同时,另一种新兴范式被称为Headless BI,它专注于构建统一的语义数据模型,并引起了广泛的关注。Headless BI通过一个通用的语义层来实现,通过开放的API公开一致的数据语义。 + +从我们的角度来看,Chat BI和Headless BI的融合有潜力在两个方面增强Text2SQL的能力: + +1. 将数据语义(如业务术语、列值等)纳入提示词中,使LLM能够更好地理解语义,以**减少幻觉**。 +2. 将高级SQL语法(如连接、公式等)的生成从LLM卸载到语义层,以**减少复杂度**。 为了验证上述想法,我们开发了SuperSonic项目,并将其应用在实际的内部产品中。与此同时,我们将SuperSonic作为一个可扩展的框架开源,希望能够促进数据问答对话领域的进一步发展。 @@ -26,6 +28,7 @@ - 内置Chat BI界面以便*业务用户*输入数据查询。 - 内置Headless BI界面以便*分析工程师*构建语义模型。 +- 内置基于规则的语义解析器,在特定场景可以提升运行效率。 - 支持文本输入的联想和查询问题的推荐。 - 支持四级权限控制:主题域级、模型级、列级、行级。