介绍
pandas-ai
是一个开源套件,能够让使用者用 Prompt 的方式请 LLM 帮忙分析 DataFrame
(等价于excel) 里面的数据。
使用方式
以下直接照搬专案说明文件内容:
要求 PandasAI 查找 DataFrame 中某列值大于 5 的所有行
import pandas as pdfrom pandasai import SmartDataframe# Sample DataFramedf = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]})# Instantiate a LLMfrom pandasai.llm import OpenAIllm = OpenAI(api_token="YOUR_API_TOKEN")df = SmartDataframe(df, config={"llm": llm})df.chat('Which are the 5 happiest countries?')
输出:
6 Canada7 Australia1 United Kingdom3 Germany0 United StatesName: country, dtype: object
要求 PandasAI 执行更複杂的查询。例如,您可以要求 PandasAI 计算 2 个最不幸福国家的 GDP 总和:
df.chat('What is the sum of the GDPs of the 2 unhappiest countries?')
输出:
19012600725504
请 PandasAI 绘製图表:
df.chat( "Plot the histogram of countries showing for each the gdp, using different colors for each bar",)
输出:
重头戏: 如何免费使用?
以 OpenAI LLM 模型作为核心,我们需要有 OpenAI API Key
,那不就是要付费了吗? (っ °Д °;)っ
但是!!! 我不想付费怎么办? 我就是客家阿(;´д`)ゞ
大家可以参考我前几两篇的文章:
免费开发与使用 Document GPT (ChatGPT 不能问的,这个可以!)完全免费使用 OpenAI api,以自动生成多语言版本的 README.md为例在前两篇文章中,我们知道 gpt4free
透过逆向工程方式,让我们可以免费、无限次数的使用 OpenAI 的 gpt3.5
模型。
那么老套路重现,我们只要把 gpt4free
作为 LLM 模型核心,这样就可以在不需要付费的情况下使用 pandas-ai
了! >_<
透过继承方式实现,如下:
import g4fimport pandas as pdfrom pandasai import SmartDataframe, PandasAIfrom pandasai.llm import LLMfrom pandasai.prompts.base import AbstractPromptclass Gpt4free(LLM): """ Class to wrap gpt4free LLMs and make PandasAI interoperable with gpt4free. """ def __init__( self, model: str = "gpt-3.5-turbo", provider: g4f.Provider = None, stream: bool = False, ): """ __init__ method of Gpt4free Class Args: model (str): Model of OpenAI API. provider (g4f.Provider): The Provider of OpenAI API. stream (bool): Completion with streaming. """ self.model = model self.provider = provider self.stream = stream def call(self, instruction: AbstractPrompt, suffix: str = "") -> str: prompt = instruction.to_string() + suffix try: response = g4f.ChatCompletion.create( model=self.model, provider=self.provider, messages=[{"role": "user", "content": prompt}], ) except Exception as e: raise RuntimeError(f"Failed to create chat completion with Gpt4free: {str(e)}") from e return response @property def type(self) -> str: return "gpt4free"# Sample DataFramedf = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]})# Instance a LLMllm = Gpt4free()df = SmartDataframe(df, config={"llm": llm})print(df.chat('Which are the 5 happiest countries?'))print(df.chat('What is the sum of the GDPs of the 2 unhappiest countries?'))# Plotpandas_ai = PandasAI(llm, verbose=True, save_charts=True)df.chat( "Plot the histogram of countries showing for each the gdp, using different colors for each bar",)
输出结果将与说明文件一致ಥ_ಥ
补充: 原本小弟发了一个 PR#613,想让用户可以直接轻鬆实现,省去自行宣告Gpt4free
继承的程式,但是gpt4free
毕竟是种逆向工程,对于商业需求开发也不好,想想还是关了。当然,各位也可以参考这个 PR 直接改写本地原生套件,但切记不要用去商业开发...