Skip to content

Evermaple/lastresort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lastresort

When there's no API, no WebView, and SSL Pinning blocks every proxy — there's still the screen.

中文版

Python 3.8+ Android AI Assisted MIT License

A methodology and toolkit for extracting structured data from any mobile app's screen —
when every other approach has failed.


The Problem

You need data from a mobile app. You open Charles Proxy. Nothing.

You try mitmproxy. SSL Pinning. You try Frida to bypass it. The app uses certificate transparency. You decompile the APK. It's talking directly to Firestore. There's no REST API. There never was.

Modern apps are increasingly unscrapable by design:

What you tried Why it failed
HTTP proxy (Charles, mitmproxy, Fiddler) SSL Pinning / Certificate Transparency
API reverse engineering No REST API — direct database (Firestore, iCloud, Realm Sync)
Web scraping No web version exists
APK decompilation Obfuscated, or uses proprietary binary protocols
Browser automation App is mobile-only, no WebView

You've exhausted every option. This is your last resort.

The Insight

Every app that displays data to a human is, by definition, rendering that data on screen. The screen is a universal, unblockable API. No amount of SSL Pinning, certificate transparency, or proprietary protocols can prevent an app from showing its own data to its own user.

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    Database ──→ Encrypted Channel ──→ App Logic ──→ Screen  │
│         🔒            🔒                 🔒          👁️    │
│    you can't       you can't         you can't     you CAN  │
│    access this     intercept this    reverse this  read this │
│                                                             │
└─────────────────────────────────────────────────────────────┘

lastresort turns this insight into a practical, repeatable methodology — assisted by AI to make it fast.

How It Works

┌──────────┐     ┌──────────────┐     ┌────────────────┐     ┌──────────┐
│  Android  │     │     UI       │     │   Extraction   │     │  Struct  │
│  Device   │────→│  Automation  │────→│   Pipeline     │────→│  Data    │
│  (ADB)    │     │ (uiautomator2)    │ (parse + OCR)  │     │  (CSV)   │
└──────────┘     └──────────────┘     └────────────────┘     └──────────┘
                        │                     │
                   Navigate,             Read element
                   scroll,               attributes,
                   tap, swipe            screenshot + OCR
  1. Connect to a real Android device via ADB
  2. Automate the app UI — scroll through lists, tap into details, swipe between tabs
  3. Extract data from UI element attributes (content-desc, text, resourceId) or via OCR on screenshots
  4. Output structured data (CSV, JSON, etc.)

What Makes This Different

There are plenty of UI automation tools. Here's what lastresort adds:

1. A battle-tested methodology

Not "here's an API, go figure it out" — but a step-by-step methodology born from actually scraping apps that fight back:

  1. Recon — Map the app's UI element structure
  2. Identify data carriers — Find where data hides (content-desc? pixels? nowhere obvious?)
  3. Map navigation traps — Does pressing back reset your scroll? Do filters clear?
  4. Design collection phases — Split work to survive navigation side effects
  5. Build extraction pipeline — Element parsing with OCR fallback

2. AI makes this practical

The traditional approach to UI automation scraping is agonizing — manually inspecting elements, writing brittle selectors, debugging scroll timing. AI coding agents change the game:

  • Screenshot a page → AI analyzes the element structure and suggests extraction logic
  • Paste a content-desc blob → AI writes the parser function
  • "The scroll keeps missing element #47" → AI adjusts timing and scale parameters
  • "OCR can't read this" → AI adds image preprocessing and fallback strategies

What used to take a week of trial-and-error now takes an afternoon of conversation. See the AI-assisted workflow guide.

3. Real patterns from real scraping

Every pattern in this project was extracted from a real scraping engagement, not invented in a vacuum:

Pattern Problem it solves
Two-Phase Collection Detail pages that reset list scroll position
Fast-Scroll Targeting Skip hundreds of items to reach a known target
Stale Detection Know when you've hit the bottom of an infinite scroll
OCR Fallback Devices that render text as images (no accessible text)
Progress Checkpoint Resume a 3-hour scrape from where it crashed
Back-to-Top Exploit Turn a navigation bug into a reliable reset mechanism

Real Example: Livestream App Ranking

We scraped a Japanese livestream app that has:

  • No REST API (Android talks directly to Firestore)
  • No web version
  • SSL Pinning on all connections
  • UI rendered with no resourceId — all data packed into content-desc strings

Result: 100+ streamer profiles extracted — rank, name, scores, tags, bio, social links, X/Twitter handle — exported to CSV.

rank  name                          score  x_id        bio
1     ****😊🎊魔人&99挑戦🔥          98     @s***1118   ****光昭/****😊🎊...
2     ****ゆり                        71     @i***ri     #YOASOBI #椎名林檎...
3     ****づき🌻ゲリラ昼🌻タ方🌙      83     @h***zk     #ヨルシカ #宇多田ヒカル...

The scraper uses the Two-Phase Collection pattern:

  • Phase 1: Scroll the entire ranking list, extracting basic info from each card (never entering detail pages, so scroll position is preserved)
  • Phase 2: For each streamer, navigate from the top to their card, enter their detail page, extract bio/social links, and return

See the full example →

Quick Start

Run the example

# Install dependencies
pip install uiautomator2 pytesseract Pillow
brew install tesseract  # macOS (for OCR fallback)

# Connect your Android device
adb devices

# Run the livestream ranking scraper
cd examples/livestream-ranking
python crawler.py

Build your own

  1. Read the Methodology Guide (15 min)
  2. Read the AI-Assisted Workflow (10 min)
  3. Install weditor to analyze your target app:
    pip install weditor
    python -m weditor
  4. Open your target app on the device, analyze the UI structure in weditor
  5. Start building with AI assistance — paste screenshots and element attributes into Claude/ChatGPT and iterate

Tech Stack

Tool Role
uiautomator2 Android UI automation (no Appium server needed)
Tesseract OCR Fallback text extraction from screenshots
weditor Visual UI element inspector
Python 3.8+ Everything glued together

Why uiautomator2 over Appium? No server process, pure Python, lighter dependencies, better for scripting. When you're iterating quickly with AI assistance, the last thing you want is an Appium server crashing between runs.

Contributing

Add your own example

The best way to contribute is to add a new example from an app you've scraped:

examples/
├── livestream-ranking/     # Existing: Japanese livestream app
├── your-example/           # Your contribution
│   ├── README.md           # What you scraped and how
│   ├── crawler.py          # The scraper
│   ├── config.py           # Configuration
│   └── output/             # Sample output

Guidelines:

  • De-identify the app if possible — focus on the technique, not the target
  • Document the challenges — what made this app hard to scrape?
  • Document the patterns used — which patterns from docs/common-patterns.md did you use?
  • Include a small sample output so readers can see what gets extracted

Improve the methodology

Found a new pattern? A better approach to a known problem? PRs to docs/ are welcome.

When to Use lastresort (and when not to)

Use it when:

  • The app has no API, or the API is locked behind auth you can't replicate
  • SSL Pinning / certificate transparency blocks proxy tools
  • The app is mobile-only with no web version
  • You need data that's visible on screen but inaccessible through any programmatic interface

Don't use it when:

  • There's a public API (use it!)
  • The data is available on a website (use traditional web scraping)
  • You need real-time streaming data (UI automation is inherently slow)
  • You need to scrape at high frequency (this is minutes-per-run, not milliseconds)

FAQ

Q: Is this legal? A: This project automates reading publicly visible information from a device you own, similar to how a screen reader works. However, always check your target app's Terms of Service and your local laws. This tool is intended for research, personal use, and authorized data collection.

Q: Does this work on iOS? A: The current implementation is Android-only (via uiautomator2 + ADB). iOS UI automation requires XCTest/WebDriverAgent, which is a different stack. The methodology applies equally — contributions for iOS tooling are welcome.

Q: How fast is it? A: Depends on the app and what you're extracting. The livestream example scrapes ~100 profiles with full detail in about 2-3 hours. Phase 1 (list only) takes ~10 minutes. This is not meant for speed — it's meant for getting data that's otherwise impossible to get.

Q: What if the app updates its UI? A: The extraction logic will likely break. That's the trade-off. The methodology is designed to make rebuilding fast — especially with AI assistance, you can adapt to a new UI in under an hour.

License

MIT


When all else fails, there's still the screen.

About

When all else fails, there's still the screen.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors