You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(telemetry): add anonymous opt-out PostHog telemetry for v4.3.0
Add lightweight, privacy-preserving usage telemetry to understand which
engines, functions, and features are actually used. Zero new dependencies
(stdlib urllib.request only). Fire-and-forget daemon threads ensure zero
latency impact.
- Create datafog/telemetry.py with PostHog /capture/ integration
- Instrument detect, process, detect_pii, anonymize_text, scan_text,
get_supported_entities, DataFog class, TextService, and CLI commands
- Wire track_error() into exception handlers for error visibility
- Opt-out via DATAFOG_NO_TELEMETRY=1 or DO_NOT_TRACK=1
- Anonymous ID via SHA-256 of machine info (no PII)
- Text lengths bucketed, error messages never sent
- Thread-local dedup prevents double-counting nested calls
- Fix services/__init__.py to lazy-import ImageService and SparkService,
so TextService works on minimal installs without aiohttp/PIL/pyspark
- Fix pre-existing NameError in __init__.py detect() for RegexAnnotator
- 44 tests covering opt-out, privacy, non-blocking, payloads, integration,
error tracking, and edge cases
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
DataFog collects **anonymous** usage telemetry to help us understand which features are used and prioritize development. This data contains:
300
+
301
+
- Function and engine usage (e.g., "regex" vs "gliner")
302
+
- Coarse performance buckets (e.g., "10-100ms"), never exact timings
303
+
- Error class names only (e.g., "ImportError"), never error messages or stack traces
304
+
- A one-way hashed machine identifier — no IP addresses, usernames, or file paths
305
+
306
+
**No text content, PII, or personally identifiable information is ever collected.**
307
+
308
+
To opt out, set either environment variable before running DataFog:
309
+
310
+
```bash
311
+
export DATAFOG_NO_TELEMETRY=1
312
+
# or
313
+
export DO_NOT_TRACK=1
314
+
```
315
+
316
+
Telemetry uses only Python's standard library (`urllib.request`) — no additional dependencies are installed. All sends are fire-and-forget in background threads and will never affect performance or raise exceptions.
0 commit comments