Reddit Word Cloud Analysis - r/dataisbeautiful — Angelika Kasandra Hołod

Overview

This interactive analysis explores the most discussed topics in the r/dataisbeautiful community. The word cloud visualization below shows the top 100 most frequent terms from post titles, with interactive features to explore the data.

Interactive Word Cloud

100

Unique Words

years

Most Frequent

Peak Frequency

10,000

Posts Analyzed

Color Scheme:

Top 20 Words by Frequency

1years58

2202050

3covid1945

4year43

5world39

6per38

7states37

8last36

9map35

10every34

11day33

12population33

13data32

14since31

15people31

16state28

17top28

18deaths26

19cases25

20time25

Interactive word cloud of top 100 terms from r/dataisbeautiful. Click words to highlight.

Key Insights

Dataset Performance

The analysis processed 3,029 unique words from 6,878 total words extracted from the top posts, achieving a processing speed of 377 words/second.

Community Interests

The r/dataisbeautiful subreddit clearly focuses on:

Temporal data analysis - Most popular topic
COVID-19 tracking visualizations - Pandemic data dominates
Geographic mapping - Country/state level analysis
Population studies - Demographic datasets
Comparative analysis - "vs", "compared", "difference"

Methodology

Data Collection

# Reddit API data collection with PRAW
subreddit = reddit.subreddit("dataisbeautiful")
posts = subreddit.top(time_filter="year", limit=10000)

Text Processing

Extraction: Post titles from top posts
Cleaning: URL removal, special character filtering
Normalization: Lowercasing, stopword removal
Tokenization: Word boundary detection
Aggregation: Frequency counting by term

Visualization

Built with D3.js and d3-cloud for interactive word cloud generation:

Font scaling based on frequency
Spiral layout algorithm for word placement
Rotation variation (-30°, 0°, 30°)
Interactive click-to-highlight functionality

Data Summary

| Metric | Value | |--------|-------| | Posts Analyzed | ~10,000 top posts | | Total Words | 6,878 | | Unique Terms | 3,029 | | Processing Time | 18.2 seconds | | Top Word | "years" (58 occurrences) | | Data Source | Reddit API (PRAW) |

Limitations

Title bias: Only post titles analyzed (not comments)
Temporal bias: Top posts skew toward popular content
Vocabulary evolution: Memes and slang change over time
Sample coverage: Analysis covers top posts, not full subreddit

Reddit Word Cloud Analysis - r/dataisbeautiful