T his project started as an idea to process 1.3 million rows of financial data interactively. Over time, it evolved into a powerful AI-powered web application that can accept any size dataset, clean it, perform feature engineering, and visualize it in an intuitive UIβall powered by GPT-4o, LangChain, and serverless AWS tools.
π Live Architecture
- EC2: Flask backend with Gunicorn and Nginx
- S3: Secure file storage for datasets
- RDS (MySQL): Stores metadata and session logs
- Lambda: Triggers OpenAI agents for data cleaning
- Secrets Manager: Secure API key management
- GitHub Actions: Handles CI/CD deployment
Backend
Flask
Python
Gunicorn
Frontend
HTML
Streamlit
Jinja2
TailwindCSS
AI Agents
GPT-4o
AI Data Science Team
LangChain
Infrastructure
S3
EC2
RDS
Lambda
Secrets Manager
CI / CD
GitHub Actions
Letβs Connect
If youβre a builder, dreamer, or data explorer, reach out. Whether it's a conversation, a collaboration, or just curiosity, Iβd love to connect. You can find me on LinkedIn or email me directly.
Step 1: Register / Login
New users create an account or log in via a secure Flask authentication flow. Once authenticated, they land in the main dashboard.
Step 2: Upload CSV
Users drag and drop datasets directly into the dashboard. Files are uploaded to AWS S3 using secure presigned URLs to bypass local limitations.
Step 3: Clean + Engineer
Users trigger an AI-powered pipeline that runs `DataCleaningAgent` and `FeatureEngineeringAgent` on the dataset asynchronously using AWS Lambda.
Step 4: Store Output
Once processed, cleaned files are saved in a `cleaned/` prefix in S3. All job metadata, timestamps, and session info are stored in RDS for traceability.
Step 5: Explore with AI
Users launch a dynamic Streamlit interface that loads the cleaned preview and visual insights. Features include charts, agentic mind maps, and NL β SQL querying.
"This is super cool! I remember struggling to bridge that gap between insights and code. A tool that shows the 'how' is a total game-changer for learning." - Tran Tien Van
Cloudberry_AWS_Bootcamp/
β
βββ Portfolio_V2/ # Core Flask app
β βββ app.py # Entrypoint Flask application
β βββ templates/ # Jinja2 HTML templates
β βββ static/ # TailwindCSS and JS
β βββ utils/ # AI pipeline, S3 handlers, and secrets manager
β βββ data_forge_lite/ # Streamlit-powered exploration dashboard
β βββ requirements.txt
β
βββ .github/workflows/ # GitHub Actions CI/CD scripts
βββ README.txt # You're reading it!
βββ start.sh # Launch script for Gunicorn
git clone https://github.com/LannonTheCannon/Cloudberry_AWS_Bootcamp.git
cd Cloudberry_AWS_Bootcamp/Portfolio_V2
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 app.py