A few months ago I used streamlit to build a simple UI, so I can collect manually labeled data for a LLM fine-tuning task at work. Streamlit is fine, but the full process of creating a nice UI with required functionalities for data annotation and data storage management wasn’t trivial.

Today I found out about label-studio which is an easy to use framework (backend and frontend) for data annotation task. It provides various annotation templates for text, image, audio, and video data!

I tried it with uv locally, but it can be self-hosted using docker-compose as well:

uv add label-studio
uv run label-studio

Then a very nice looking UI pops out in the browser, you need to sign up (why?) and create a new project. When creating a project, I selected Labeling setup > Natural Language Processing > Text Classification and saved it with the default choices (positive, negative, neutral). I then imported a sample dataset from huggingface and ended up with this clean and user friendly interface for lebeling the data.

label-studio-screenshot

label-studio-screenshot

Label-studio supports multiple users, and it records which user has annotated which record and when. You can export the lebeled data in various formats including CSV. And another cool feature of label-studio is its integration with ML and AI tools, so you can run the same annotated records with a model and compare the output with the human annotated values.

I will use label-studio next time I have a data labeling work.


Comment? Reply via Email or Bluesky or Twitter.