Evaluating Bias and Toxicity in LLMs

Rodríguez del Corral, María Victoria

Repository

How to publish

Visibility

FAQs

Show simple item record

Evaluating Bias and Toxicity in LLMs

dc.contributor.advisor	Herrera García, Vicente Octavio
dc.contributor.author	Rodríguez del Corral, María Victoria
dc.date.accessioned	2025-07-30T09:34:00Z
dc.date.available	2025-07-30T09:34:00Z
dc.date.issued	2025-07
dc.identifier.citation	Rodríguez del Corral, M.V. Evaluating Bias and Toxicity in LLMs [Trabajo Final de Máster, Universidad Loyola Andalucía]	es
dc.identifier.uri	https://hdl.handle.net/20.500.12412/6737
dc.description.abstract	This master´s thesis investigates bias and toxicity in Large Language Models (LLMs) as a central concern for AI Safety and AI Alignment. Guided by a series of different benchmarks and the 3H framework, it systematically shows how publicly available checkpoints behave when faced with reasoning, demographic and open-ended safety challenges. Three Jupyter notebooks integrate the harness evaluation, Hugging Face bias and customized safety prompts to deliver a reliable and standardized benchmarking framework. Beyond establishing that raw accuracy is no guarantee of ethical soundness, the thesis details how those gaps were uncovered. Each notebook covers different layers of the problem: one benchmarks factual and reasoning skills, another measures Toxicity and Bias , and a third runs multi‑turn dialogues that surface context‑dependent harms. This setup means new models or datasets can be swapped in with minimal code changes, giving future AI Safety a solid base for its tests. The study argues that current AI systems reflect the same offline social power dynamics. Addressing those issues calls for more than clever code modifications; it demands continuous processes including broader data curation or tighter model‑governance rules and humans firmly educated and in the loop. Together, these suggestions provide a clearer guide to using models effectively and responsibly. Overall, this work applies the 3H framework to a practical benchmarking process, highlighting where current models still have weaknesses and offering clear steps to develop AI that is safer and fairer. In the future, more people should be involved, and the tests used to check AI should be kept up to date so they stay useful as the technology keeps changing.	es
dc.language.iso	eng	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	Evaluating Bias and Toxicity in LLMs	es
dc.type	masterThesis	es
dc.description.master	Máster Universitario en Inteligencia Artificial	es
dc.rights.accessRights	openAccess	es
dc.subject.keyword	AI	es
dc.subject.keyword	AI Safety AI	es
dc.subject.keyword	Bias	es
dc.subject.keyword	Toxicity	es
dc.subject.keyword	3H	es
dc.subject.keyword	AI Alignment	es

Files in this item

Name:: TFM María Victoria Rodríguez.pdf
Size:: 96.71Mb
Format:: PDF

This item appears in the following Collection(s)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional