Lannon Khau

Exo Explorer

Blue Scorpion
Lannon Khau

By Lannon Khau

· 5 min read
E xo-Explorer, an AI-powered web app that cleans, engineers, and analyzes NASA's extensive exo-planetary data to predict habitability scores. Built with GPT-4o, LangGraph, and H2O AutoML, it automates scientific discovery, combining rule-based filtering with probabilistic modeling to surface potentially life-supporting worlds across the galaxy.

🧠 Local AI Architecture

  • Streamlit: Frontend interface for interactive exploration
  • LangGraph + GPT-4o: Orchestrates a modular data-cleaning agent with memory and human-in-the-loop review
  • Pandas + SQLite: Stores planetary data and model outputs locally
  • H2O AutoML: Trains models to predict exoplanet habitability with automated feature selection and tuning
  • OpenAI + LangChain: Powers agent-based code generation and reasoning
  • Local CSV & SQL Storage: Persist cleaned datasets and predictions for download or analysis

Backend

Python Python
Python SQL Alchemy

Frontend

Streamlit Streamlit

AI/ML

OpenAI GPT-4o
AI Data Science Team AI Data Science Team
LangChain LangChain
LangChain LangGraph
h20 H2O AutoML

CI / CD

GitHub Actions GitHub Actions

Let’s Connect

If you’re a builder, dreamer, or data explorer, reach out. Whether it's a conversation, a collaboration, or just curiosity, I’d love to connect. You can find me on LinkedIn or email me directly.


πŸ§ͺ How It Works

Step 1: Open the App

ExoExplorer uses NASA Exoplanet Archive

Users first visualize the dataset in a Streamlit dashboard. The app connects to the NASA Exoplanet Archive and it ready for H2O AutoML predictive analysis.

Step 2: Click on Train Model

H2O AutoML Begins Training

The user then clicks on Start Training, which triggers the H2O AutoML pipeline. This pipeline automatically cleans, engineers, and trains a model on the dataset using LangGraph and GPT-4o.

Step 3: AutoML Leaderboard generated

A Best Model is Chosen

The pipeline runs a series of data cleaning and feature engineering steps, generating a leaderboard of models. The best-performing model is selected based on metrics like accuracy and F1 score.

Step 4: Probability-Habitability Column now Fulfilled

Each Planet Record now Contains Habitability Score/h4>

The best model's predictions are saved back to the SQLite database, enriching each planet record with a habitability score. This allows users to filter and explore potentially habitable worlds.

Step 5: Explore The Exoplanet

Planets With Highest Habitability Score

Explore the actual worlds with the highest habitability scores. Visit the NASA Exoplanet Archive to view the full details of each planet, including its characteristics and potenital for supporting life.


πŸ“ Repository Structure

ExoExplorer/
β”‚
β”œβ”€β”€ Exo_Explorer.py                  # Streamlit app for exoplanet habitability
β”‚
β”œβ”€β”€ cleaning_agent/                 # LangGraph-powered data cleaning agent
β”‚   β”œβ”€β”€ agent.py                    # Main agent logic and pipeline
β”‚   β”œβ”€β”€ utils.py                    # Helper functions
β”‚
β”œβ”€β”€ dataset/                        # NASA exoplanet CSV dataset
β”œβ”€β”€ database/                       # SQLite database for storing results
β”œβ”€β”€ models/                         # Saved H2O AutoML models
β”œβ”€β”€ img/                            # Images or diagrams (optional)
β”‚
β”œβ”€β”€ templates/                      # Prompt templates for agent instructions
β”œβ”€β”€ parsers/                        # Output parsers for GPT responses
β”œβ”€β”€ tools/                          # Shared tools/utilities
β”œβ”€β”€ utils/                          # General-purpose utility scripts
β”‚
β”œβ”€β”€ README.md                       # Project overview and instructions
β”œβ”€β”€ requirements.txt                # Core Python dependencies
β”œβ”€β”€ additional-requirements.txt     # Extra deps for experiments or modules
└── .gitignore                      # Ignore Python caches, models, etc.
              

πŸ“ Run Locally

  git clone https://github.com/LannonTheCannon/gen_ai_bootcamp.git
  cd Week3
  python3 -m venv venv
  source venv/bin/activate
  pip install -r requirements.txt
  streamlit run Exo_Explorer.py