Skip to the content.

GPT-Auto-Data-Analytics

Automate local data analysis with groups of tool-using GPT agents.

ChatGPT and Code Interpreter has changed the way of data analysis. But have you ever got frustrated about โ€ฆ

This project aims to replicate the online code interpreter experience locally, but also addressing all the issues mentioned above:

๐Ÿš€ Installation

git clone https://github.com/Animadversio/GPT-Auto-Data-Analytics.git
cd GPT-Auto-Data-Analytics
pip install -e .

Usage

# set up openai key env variable
from auto_analytics.tabular_analysis_session import TabularAnalysisPipeline
# Tell the agent some info about your data and columns, and your overall objective.
table_descriptions = """..."""
column_descriptions = """..."""
task_objective = """Perform explorative data analysis of this dataset to uncover relationships among different variables."""
csvpath = "~/GPT-Auto-Data-Analytics/table_data/Diabetes_Blood_Classification.csv"
report_root = "reports"

# Example usage
analysis_session = TabularAnalysisPipeline("Diabetes_Classification", csvpath, report_root=report_root)
# Set up dataset and column descriptions
analysis_session.set_dataset_description(table_descriptions, column_descriptions)
# Let supervisor agent setup and save analysis task
analysis_session.supervisor_set_analysis_task(task_objective, ) 
# Let research assitent perform data analysis
analysis_session.perform_data_analysis(query=None, MAX_ROUND=30)
# Save results to notebook, HTML, and PDF
analysis_session.save_results()

Play with our Colab demo! Open In Colab

Video walk-through of our system

โœ… Current Features ๐ŸŒŸ

How does it work?

Click to expand! **System Overview**. Our high-level idea is to emulate the workflow in research labs: we will assign roles to AI agents, such as research supervisor or student coding agent and let them work in a closed loop. The supervisor, equipped with broader background knowledge, is tasked to set an overarching research goal, and then break it down into detailed code-solvable tasks which are sent to the student. Then, the student coding agent, equipped with coding tool and vision capability, will tackle the tasks one-by-one, synthesizing their findings into reports for the supervisor's review. This iterative process allows the supervisor to refine their understanding and possibly adjust the research agenda, prompting further investigation. Through this collaborative effort, both parties converge to a unified conclusion, culminating in a final report with code, figures and text interpretations. **Internal Coding Loop**. At the core of our system lies the coding agent, an LLM agent who is tasked for conducting interactive data analysis. This agent will take in task objective, and then analyze datasets by outputting code snippets, which are executed in a local IPython kernel. The results of code execution (including error message) will be sent back to LLM in the form of text strings. Notably, when figures are generated, they will also be turned into text descriptions by a multimodal LLM prompted to interpret the figure .

๐ŸŒ Whatโ€™s Next?

Join Us on This Journey

As an end goal, we hope our tools can auto-pilot data analysis on its own. If youโ€™re interested in using our tool in your daily data analysis work or scientific research, weโ€™d love to hear your thoughts and what you find interesting!