| dc.contributor.advisor | Herrera García, Vicente Octavio | |
| dc.contributor.author | Rodríguez del Corral, María Victoria | |
| dc.date.accessioned | 2025-07-30T09:34:00Z | |
| dc.date.available | 2025-07-30T09:34:00Z | |
| dc.date.issued | 2025-07 | |
| dc.identifier.citation | Rodríguez del Corral, M.V. Evaluating Bias and Toxicity in LLMs [Trabajo Final de Máster, Universidad Loyola Andalucía] | es |
| dc.identifier.uri | https://hdl.handle.net/20.500.12412/6737 | |
| dc.description.abstract | This master´s thesis investigates bias and toxicity in Large Language Models
(LLMs) as a central concern for AI Safety and AI Alignment. Guided by a series
of different benchmarks and the 3H framework, it systematically shows how
publicly available checkpoints behave when faced with reasoning, demographic
and open-ended safety challenges. Three Jupyter notebooks integrate the
harness evaluation, Hugging Face bias and customized safety prompts to
deliver a reliable and standardized benchmarking framework.
Beyond establishing that raw accuracy is no guarantee of ethical soundness,
the thesis details how those gaps were uncovered. Each notebook covers
different layers of the problem: one benchmarks factual and reasoning skills,
another measures Toxicity and Bias , and a third runs multi‑turn dialogues that
surface context‑dependent harms. This setup means new models or datasets
can be swapped in with minimal code changes, giving future AI Safety a solid
base for its tests.
The study argues that current AI systems reflect the same offline social power
dynamics. Addressing those issues calls for more than clever code
modifications; it demands continuous processes including broader data curation
or tighter model‑governance rules and humans firmly educated and in the
loop. Together, these suggestions provide a clearer guide to using models
effectively and responsibly.
Overall, this work applies the 3H framework to a practical benchmarking
process, highlighting where current models still have weaknesses and offering
clear steps to develop AI that is safer and fairer. In the future, more people
should be involved, and the tests used to check AI should be kept up to date so
they stay useful as the technology keeps changing. | es |
| dc.language.iso | eng | es |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
| dc.title | Evaluating Bias and Toxicity in LLMs | es |
| dc.type | masterThesis | es |
| dc.description.master | Máster Universitario en Inteligencia Artificial | es |
| dc.rights.accessRights | openAccess | es |
| dc.subject.keyword | AI | es |
| dc.subject.keyword | AI Safety AI | es |
| dc.subject.keyword | Bias | es |
| dc.subject.keyword | Toxicity | es |
| dc.subject.keyword | 3H | es |
| dc.subject.keyword | AI Alignment | es |