This week, we took a break from tools and talk about how to use GenAI for analytics.
The session is conducted by Bala Panneerselvam. Bala is a founding member of Applied AI club, founder of ZORP with 17+ years in technology and product.
If you've missed the session or if you'd like to go through it again, here's the session video - https://youtu.be/B0hSoPrFuDU
Here's the notes from the meeting:
Meeting Purpose Explore how to use generative AI for data analytics, covering different levels of implementation and key principles.
Key Takeaways
- Five levels of Gen AI for analytics: 1) Basic data upload, 2) AI-assisted tools, 3) Analytics assistants, 4) Analytics copilots, 5) Autonomous agents
- Data governance and privacy are critical concerns when using Gen AI with enterprise data
- Implementing Gen AI for analytics requires careful consideration of data preparation, metadata, and context provision to LLMs
- Future developments in autonomous agents and fine-tuned models will further enhance analytics capabilities
Topics Overview of Analytics Workflow
- Typical workflow: Data preparation (production DBs, data warehouses, data lakes) → Analytics process (BI tools, ad hoc analysis) → Publishing reports/dashboards
- Data governance becoming increasingly important as organizations grow and data complexity increases
Level 0: Traditional Analytics Tools
- Current tools: Tableau, Looker, Metabase, Superset, Jupyter notebooks, SQL queries in BI tools
- Excel still widely used for data analysis, transformation, and visualization
Level 1: Basic Gen AI Integration
- Uploading CSV files directly to ChatGPT or similar tools for analysis
- Major concern: Data privacy and governance issues when sharing sensitive data with third-party AI services
Level 2: AI-Assisted Analytics Tools
- Using existing analytics tools with AI assistance (e.g., Google Colab, Jupyter notebooks with AI extensions)
- Allows non-technical users to perform analysis using natural language prompts
- Example demonstrated: Using Colab to analyze sales data with AI-generated code
Level 3: Analytics Assistants
- Purpose-built AI assistants for data exploration and analysis
- Example tools: Julius.ai, custom-built chatbots connected to databases
- Provides more control over data access and processing compared to Level 1
Level 4: Analytics Copilots
- Integration of analytics capabilities into existing communication tools (e.g., Slack, WhatsApp)
- Allows asynchronous, passive querying and analysis of data
- Useful for busy executives and managers who need quick insights
Level 5: Autonomous Agents
- AI agents that proactively analyze data, identify patterns, and generate insights
- Utilizes historical context, user preferences, and relevance assessment to provide valuable information without explicit queries
Data Preparation and Governance
- Importance of proper data structuring, labeling, and metadata creation
- Need for systems to identify relevant tables, columns, and data samples for each query
- Potential use of knowledge graphs or other frameworks to help AI understand data relationships
Challenges and Considerations
- Handling multiple interconnected datasets and complex data relationships
- Ensuring accuracy and preventing hallucinations when dealing with large datasets
- Balancing between providing enough context and minimizing token usage/processing time
Future Developments
- Fine-tuned models specifically for analytics tasks (e.g., Claude's finance-focused model)
- Improved integration of AI with existing BI and visualization tools
- Enhanced autonomous agents with better understanding of business context and user needs
Next Steps
- Explore fine-tuned models for specific analytics use cases
- Implement robust data governance practices when using Gen AI for analytics
- Consider developing custom solutions for metadata management and context provision to LLMs
- Investigate ways to leverage query logs and user feedback to improve AI-powered analytics over time
- For those interested, review the code for the custom analytics assistant (to be shared via email)
Here's the entire recording of the session.