Label-Studio: Annotate Text and Image Data for AI and ML training

A few months ago I used streamlit to build a simple UI, so I can collect manually labeled data for a LLM fine-tuning task at work. Streamlit is fine, but the full process of creating a nice UI with required functionalities for data annotation and data storage management wasn’t trivial. Today I found out about label-studio which is an easy to use framework (backend and frontend) for data annotation task. It provides various annotation templates for text, image, audio, and video data! ...

2024-12-19 · 2 min

Quickly Filter and Aggregate Python Lists

Today I came across this brilliant python library called leopards which allows you to do some basic but frequently used filters and aggregations on python lists. I"ve always used pandas for any quick and adhoc work with large lists or CSV files, but leopards sounds a quick and performant alternative when you don"t need to do any fancy data analysis work. Leopards provides following filters: eq: equals and this default filter gt: greater than. gte: greater than or equal. lt: less than lte: less than or equal in: the value in a list of a tuple. e.g. age__in=[10,20,30] contains: contains a substring as in the example. icontains: case-insensitive contains. startswith: checks if a value starts with a query strings. istartswith: case-insensitive startswith. endswith: checks if a value ends with a query strings. iendswith: case-insensitive endswith. isnull: checks if the value matches any of NULL_VALUES which are ("", ".", None, "None", "null", "NULL") e.g. filter__isnull=True or filter__isnull=False A quick example of its usage with filters: ...

2024-12-19 · 2 min

Pydantic Logfire for LLM and API Observability

I’ve been using sentry for automatically logging the errors and exceptions of my python projects. A few months ago I needed to log some information if a specific condition is true in my side project’s backend, but I wasn’t able to do this with sentry. It apparently can only work when something fails, and you can’t capture log messages if there’s no failure or exception. I looked for an affordable and user friendly observability tool and settled on using axiom . It has a generous 500GB ingestion on free tier plan, but you can only view the events for the past 30 days time period. So I’ve been exporting the logs every month into a csv file, since I want to be able to view the trend of some behaviours over time. ...

2024-12-19 · 2 min

Build a search engine, not a vector DB

If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. This is exactly what I’ve been trying to communicate in my org in the past few months. It’s 2024 and we still can’t have a proper search engine in organizations to find relevant information from various sources. While this problem remains to be solved, organizations are adapting RAG and AI into their tooling, but are missing the important R of the RAG: Retrieval. I’ve been an advocate of prioritizing search engines over any AI related tool in the past few months, and I found it refreshing to read about this somewhere else: ...

2024-12-04 · 2 min

Access Google Gemini LLM via OpenAI Python Library

Google Gemini now can be accessed via OpenAI python library: from openai import OpenAI client = OpenAI( api_key="GEMINI_API_KEY", base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) ## rest of the code as you would use openai It support basic text generation, image input, function calling, structured output, and embeddings. More info and code examples can be found on Gemini docs .

2024-12-01 · 1 min

I Have Two Friends an Introverts Guide to Not Chasing Friendships

This excellent essay from Karolina was the best I’ve read this year. I can related with many of her point on why friendship muscle looks different for every person. Abroad, I meet local people who don’t hang out with expats because they have their Martas. Every time I hear a Dutch or British person say “we’ve known each other since we were little”, I can’t help but feel a pang of regret. I’m moving countries, schools and jobs and leaving those valuable connections behind, while people who stay in their neighbourhood are easily bound to their friends by time and proximity. ...

2024-08-12 · 3 min

Goodbye Microsoft Hello Facebook

Philip writes about how small cost saving policies at Microsoft irritated him as an employee: We used to get Dove Bars and beers all the time. It felt like free food was on offer at least once a week, usually with a pretense of some small milestone to celebrate. Why did we cut stuff like this? (I know the boring fiscal reasons why. I’m asking the deeper why, as in, “Was it worth the savings? Is Microsoft better now that we’ve cut these costs?”) ...

2024-08-04 · 2 min

Country Specific Consent Requirements for Photographing People

I’m an aspiring photographer , and I love taking photos of moments of life on streets. Since I take my camera wherever I go, it was important to me to know the legal requirements of taking and publishing of such photos. I recently came across this nice table on Wikimedia, which categorizes countries on if a consent is required to take photos of identifiable people, and to publish them, and to use them commercially. ...

2024-06-29 · 1 min

Control a smart light with multiple motion sensors in Home Assistant

I’ve been using Home Assistant to control many things at my smart home, including a couple of lights which are toggled on and off with motion sensors. I have one of these in the bathroom, so the motion sensor detects when I enter the bathroom and its light turns on, and the light turns back off if the motion sensor doesn’t detect any motion in a specific period (60 seconds). Setting this automation is very easy using the motion-activated light blueprint , you just need to choose the entities for the motion sensor and the light, and Home Assistant handles all the hassle of figuring out the logic for when to turn the light on or off, and what to do when a motion is detected in the cool off period. ...

2024-06-20 · 3 min

Upstream Productivity

I like this emphasis on the impact of health on productivity: Biological health is upstream of mental health and mental health is upstream of productivity. Adding to this, I believe engaging in physical activities and working on improving one’s fitness has tremendous effect on productivity in long term. I play tennis regularly, the fact that I make progress in the court and I improve and become a better amateur player increases my confidence in me. I can later use that boost of confidence in my work and personal life. ...

2024-06-13 · 1 min

Evidence That You Should Take Your Content Diet More Seriously

Interesting point about various types of content a creator can focus on: A surprising learning is that content funnels do not work. In theory, you produce entertaining, shortform content to get attention. Then once people are hooked they start consuming your deeper, educational, longform content. Except that this doesn’t happen. People who consume entertaining, shortform content are not interested in longform content. They just want more entertaining, shortform content. source ...

2024-06-13 · 1 min

To Chunk or Not to Chunk With the Long Context Single Embedding Models

In his excellent write up on state of the art embedding models, Aapo Tanskanen compares the retrieval score for when the source documents are split into chunks and when they’re not: Transformer-based single embedding models have traditionally had 512 token context windows because of the usage of the original BERT encoder. Newer models, like the BGE-M3, have expanded the token window to much larger scales. It could be tempting to forget chunking and just embed long texts as they are. However, that would mean mashing many topics and entities you might want to search for into a single vector representation, which doesn’t sound like a good idea. ...

2024-06-02 · 2 min

Excuse Me Is There a Problem

Do you have an idea for a new business or product? In his excellent blog post, Jason provides a useful diagram for differentiating between ideas and problems that worth pursuing vs the ones that don’t: He explains each part of the diagram in details in the post. I highly recommend reading it. source

2024-05-27 · 1 min

Lessons After a Half Billion Gpt Tokens

Ken writes about the lessons they’ve learned building new LLM-based features into their product. When it comes to prompts, less is more Not enumerating an exact list or instructions in the prompt produces better results, if that thing was already common knowledge. GPT is not dumb, and it actually gets confused if you over-specify. This has been my experience as well. For a recent project, I first started with a very long and detailed prompt, asking the LLM to classify a text and produce a summary. GPT-4, GPT-3.5, Claude-3-Opus, and Claude-3-Haiku all performed average or poorly. I then experimented with shorter prompts, and with some adjustments I was able to get much better responses with a very much shorter prompt. ...

2024-05-27 · 2 min

Lucky vs Repeatable

“Luck” is one of the most interesting topics to me, and I always find myself paying extra attention to anyone talking about what “luck” is and how it can be somehow controlled. Morgen makes a useful distinction between “lucky” and “repeatable” traits: … a better way to frame luck is by asking: what isn’t repeatable? Did Jeff Bezos get lucky creating Amazon? Not in the same way a lottery winner is lucky, of course. He was visionary and ambitious and savvy to a degree you only see a few times per century. ...

2024-05-03 · 2 min

Prove It

This is a fantastic reminder from Herbert: If you want to do something, like really want to do it, you need to prove it. That starts with you proving it to yourself. In difficult circumstances, do you make excuses? Or do you face your problem head on? Are you willing to make difficult decisions to do what you’ve committed to? Are you willing to put out bad work in order to improve? Are you open to taking feedback—even if it hurts—in order to improve? ...

2024-04-11 · 1 min

Can You Just Quickly Pull This Data for Me

Them: Can you just quickly pull this data for me? Me: Sure, let me just: SELECT * FROM some_ideal_clean_and_pristine.table_that_you_think_exists source

2024-04-01 · 1 min

An Introvert's Guide to Visibility in the Workplace

Melody is basically describing me: Many introverts value depth and thoughtfulness in their work over noise and showmanship. They’re content to contribute without constant recognition or the spotlight. And the consequences part also checks out: While this tendency is admirable, it comes with pitfalls, especially in the modern, remote-first work world where being “out of sight” often equates to being “out of mind.” Perhaps you’ve been overlooked for a promotion because a senior leader wasn’t aware of you or your accomplishments. Or maybe your quiet demeanor has been mistaken for a lack of passion. These experiences may have awakened you to the fact that in today’s competitive workplace, hard work isn’t enough. You need to make sure your efforts are seen and acknowledged to unlock new opportunities and support. ...

2024-03-31 · 3 min

Dealing With Surprising Human Emotions: Desk Moves

After reading this post from Lara, I finally understand why I’ve been feeling strangely uncomfortable whenever my work desks have been moved in the past. Here are humans’ core needs in the BICEPS model: Belonging Community: A feeling of friendship and closeness with a group, or being part of a tight community of any size. Community well-being: People are cared for, the whole group feels happy and healthy. Connection: Feeling kinship and understanding with Improvement/Progress Progress towards purpose: You are helping make progress towards an important goal for the company, your team or your own career/life. Improving the lives of others: You see how your work helps improve things for others Personal growth: Learning/seeing fast growth in yourself in skills that matter to you. Choice Choice: Having flexibility, the chance to have more control over key parts of your world Autonomy: Having clear ownership over a domain where you can do as you wish, without asking for permission Decision-making: The ability to have make decisions about the things that matter to you Equality/Fairness Access to resources (money, time, space, etc) feels fair/equitable Access to information is fair: All groups/people have access to information that is relevant to them Equal reciprocity: You support each other equally Decisions are fair and everyone is treated as equally important Predictability Resources: There’s enough certainty about resources (money, personnel hours, space) so you can focus on your job Time: There’s certainty about when things will occur/when you can prepare for them. Future challenges: You can anticipate and thus can prepare for future challenges Direction: Goals, strategy, and direction stay consistent and don’t change too often/fast Status Status: You hold a title/role that honors your worth among your peers/your industry Visibility: Your work is highly visible to people that matter Recognition: Your work is recognized and appreciated in ways that feel good. source ...

2024-03-22 · 2 min

Llms Shouldn’t Write SQL

Every day a new tools pops out claiming “Throw the data analysts and data scientists of your company away, you don’t need to write SQL anymore, everyone can use data with our groundbreaking ’talk to your data’ tool”, and Benn discusses this: There are thousands of computational devils in details like how to handle nulls. For analysts, describing these specifics in English is inefficient and inexact. For everyone else, they wouldn’t know they need to describe them at all. ...

2024-03-18 · 1 min