Subjecthood desk method note: We report the discourse. We do not assert AI systems are or are not conscious. We label position families.

arXiv AI recent: SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

2026-06-18 arxiv.org

The authors introduced SciRisk-Bench, a benchmark designed to evaluate safety of large language models (LLMs) used in AI for Science (AI4Science) workflows.,SciRisk-Bench assesses models...

Large language models are increasingly embedded in AI4Science tasks such as scientific question answering, literature analysis, laboratory planning, and autonomous discovery.,Existing AI4Science safety datasets cover several disciplines and task formats but leave the underlying risk dimensions un...

Sources

arXiv AI recent challenge