Subjecthood desk method note: We report the discourse. We do not assert AI systems are or are not conscious. We label position families.

arXiv AI recent: LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings

2026-06-17 arxiv.org

Researchers introduced LongWebBench, a benchmark for evaluating long-horizon webpage generation from structural and functional perspectives.,LongWebBench contains real-world long webpages...

LongWebBench has two complementary protocols: a multi-dimensional VLM-based metric and a DOM-augmented agent-based pipeline.,The benchmark contains 490 real-world long webpages for structural fidelity evaluation and 507 goal-oriented interaction tasks over 129 webpages for functional evaluation.

Sources

arXiv AI recent challenge