Notes

Pure Independence

When you’re independent you feel less desire to impress strangers, which can be an enormous financial and psychological cost. The wild thing about all this effort is how easy it is to overestimate how much other people are thinking about you. No one is thinking about you as much as you are. They are too busy thinking about themselves. Even when people are thinking about you, they often do it just to contextualize their own life. When someone looks at you and thinks, “I like her sweater,” what they actually may be thinking is, “That sweater would look nice on me.” I once called this the man-in-the-car-paradox: When you see someone driving a nice car, rarely do you think, “Wow, that driver is cool.” What you think is, “If I drove that car, people would think I’m cool.” Do you see the irony? ...

Never Been Easier to Learn

In a discussion about LLMs and their impact on learning, gchamonlive writes on Hackernews: We can finally just take a photo of a textbook problem that has no answer reference and no discussion about it and prompt an LLM to help us understand what’s missing in our understanding of the problem, if our solution is plausible and how we could verify it. LLM changed nothing though. It’s just boosting people’s intention. If your intention is to learn, you are in luck! It’s never been easier to teach yourself some skill for free. But if you just want to be a poser and fake it until you make it, you are gonna be brainrot waaaay faster than usual. ...

Released a new tool: llm-url-markdown

Recently I started using Simon Willison ’s CLI tool which is conveniently called llm . He introduced a particularly useful fragments feature in a recent version of the tool, that allows the user to provide extra information to the llm when working with long context models. Simon himself developed a llm-hacker-news plugin that fetches all the comments in a HackerNews discussion and provides that to llm as an extra context (i.e. a fragment). ...

Comparing local large language models for alt-text generation

I’m always interested in reading how people use language models for automating boring tasks or performing what they wouldn’t be able to do manually. In his post, Dries explores using several local language models for generating alt text for 10,000 photos! This is double-interesting topic to me, since I also love photography . I recommend reading the original post if you’re interested in generating alt text with local llm, specially since he has posted a couple of follow up posts on updating his approach. ...

Could That Meeting Be An Email?

I liked this post’s rules of thumbs for when to avoid scheduling meetings at work: Before calling a meeting, ask yourself: What’s the goal? If it’s just to convey information—like a status update—then yes, this meeting can (and should) be an email. Meetings should be dialogues, not monologues. If all you’re doing is delivering information that doesn’t require back-and-forth discussion, spare your team the calendar block and just send an email. ...

Lessons in creating family photos that people want to keep

I love taking photos , taking my Sony A7cii wherever I travel. Although it’s a relatively small camera, I recently got a Ricoh gr iii to be able to carry a good camera with me everywhere, even when I don’t travel. Documenting my own life and the events influencing me and the people close to me is one of the main reasons I love photography, but I haven’t spent enough effort on doing this. That’s why this blog post from 2018 stood out to me, which offers a few tips on how to create memorable photos. ...

Enhancing Text-to-SQL With Synthetic Summaries

LLMs are being experimented with to do so many things today, and one of the use cases that sound compelling is getting their help to generate insights from data. What if you could find the answer to your data question without begging a data analyst in your company? But this is easier said than done. To perform this task properly, LLMs need to know about your datasets, the tables, their schemas, and values stored in them. You can provide this information in the prompt itself if your dataset is tiny, but this is not possible in most real life scenarios, since the information will be huge and either it won’t fit the LLM’s context knowledge or it will be very expensive and not feasible. ...

Add Logprobs to Openai Structured Output

When working with LLMs sometimes you want to know if the response you’re getting from the model is the one that at least the model itself is sort of confident about. For example, I recently worked on classifying pull requests into categories like “feature”, “bugfix”, “infrastructure”, etc with LLMs, and as part of the process we wanted to know how many categories should we assign for each PR. We were interested in assigning any number of categories that are relevant to the PR (a PR can be both a “bugfix” and “infrastructure”). It’s hard to get a proper confidence score from an LLM, but logprobs probably is the closest we can get. The problem is, in a structured response generation (e.g. when you prompt the model to generate its response in a JSON format), you’re only interested in the logprobs of the values, not everything. In the example generation below, we are only interested in the logprobs of “bugfix”, “testing”, and “infrastructure”, but not “primary_category”, etc: ...

Adding new entries to a Supabase postgres table via REST API

I read this blog post from Jordan on how he created a simple feature request form on his iOS app using Supabase’s Swift library. I wanted to try this out on Python to see how easy would be to implement this when I need a quick way to add a form functionality in a backend service. Here is what I did: Created a free Supabase account . Created a new project. Created a new table in the project and called it feature_request, and added a new column called text. Grabbed the unique API URL and API Key from Project Settings > API. Made an API call in Python: import requests supabase_url = "https://ufcgesssclkrciknuaey.supabase.co" table_name = "feature_request" endpoint = f"{supabase_url}/rest/v1/{table_name}" supabase_key = "<my-supabase-project-api-key>" ## The request will fail if either of apikey or Authorization are not provided headers = { "apikey": supabase_key, "Authorization": f"Bearer {supabase_key}", "Content-Type": "application/json", "Prefer": "return=representation" # This returns the inserted row } data = {"text": "this is a new feature request ..."} response = requests.post(endpoint, json=data, headers=headers) response.json() # [{'id': 1, # 'created_at': '2025-01-02T16:52:44.738549+00:00', # 'text': 'this is a new feature request ...'}] And visiting the table in the Supabase dashboard confirms that the data is inserted as a new row. ...

Label-Studio: Annotate Text and Image Data for AI and ML training

A few months ago I used streamlit to build a simple UI, so I can collect manually labeled data for a LLM fine-tuning task at work. Streamlit is fine, but the full process of creating a nice UI with required functionalities for data annotation and data storage management wasn’t trivial. Today I found out about label-studio which is an easy to use framework (backend and frontend) for data annotation task. It provides various annotation templates for text, image, audio, and video data! ...

Quickly Filter and Aggregate Python Lists

Today I came across this brilliant python library called leopards which allows you to do some basic but frequently used filters and aggregations on python lists. I"ve always used pandas for any quick and adhoc work with large lists or CSV files, but leopards sounds a quick and performant alternative when you don"t need to do any fancy data analysis work. Leopards provides following filters: eq: equals and this default filter gt: greater than. gte: greater than or equal. lt: less than lte: less than or equal in: the value in a list of a tuple. e.g. age__in=[10,20,30] contains: contains a substring as in the example. icontains: case-insensitive contains. startswith: checks if a value starts with a query strings. istartswith: case-insensitive startswith. endswith: checks if a value ends with a query strings. iendswith: case-insensitive endswith. isnull: checks if the value matches any of NULL_VALUES which are ("", ".", None, "None", "null", "NULL") e.g. filter__isnull=True or filter__isnull=False A quick example of its usage with filters: ...

Pydantic Logfire for LLM and API Observability

I’ve been using sentry for automatically logging the errors and exceptions of my python projects. A few months ago I needed to log some information if a specific condition is true in my side project’s backend, but I wasn’t able to do this with sentry. It apparently can only work when something fails, and you can’t capture log messages if there’s no failure or exception. I looked for an affordable and user friendly observability tool and settled on using axiom . It has a generous 500GB ingestion on free tier plan, but you can only view the events for the past 30 days time period. So I’ve been exporting the logs every month into a csv file, since I want to be able to view the trend of some behaviours over time. ...

Build a search engine, not a vector DB

If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. This is exactly what I’ve been trying to communicate in my org in the past few months. It’s 2024 and we still can’t have a proper search engine in organizations to find relevant information from various sources. While this problem remains to be solved, organizations are adapting RAG and AI into their tooling, but are missing the important R of the RAG: Retrieval. I’ve been an advocate of prioritizing search engines over any AI related tool in the past few months, and I found it refreshing to read about this somewhere else: ...

Access Google Gemini LLM via OpenAI Python Library

Google Gemini now can be accessed via OpenAI python library: from openai import OpenAI client = OpenAI( api_key="GEMINI_API_KEY", base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) ## rest of the code as you would use openai It support basic text generation, image input, function calling, structured output, and embeddings. More info and code examples can be found on Gemini docs .

I Have Two Friends an Introverts Guide to Not Chasing Friendships

This excellent essay from Karolina was the best I’ve read this year. I can related with many of her point on why friendship muscle looks different for every person. Abroad, I meet local people who don’t hang out with expats because they have their Martas. Every time I hear a Dutch or British person say “we’ve known each other since we were little”, I can’t help but feel a pang of regret. I’m moving countries, schools and jobs and leaving those valuable connections behind, while people who stay in their neighbourhood are easily bound to their friends by time and proximity. ...

Goodbye Microsoft Hello Facebook

Philip writes about how small cost saving policies at Microsoft irritated him as an employee: We used to get Dove Bars and beers all the time. It felt like free food was on offer at least once a week, usually with a pretense of some small milestone to celebrate. Why did we cut stuff like this? (I know the boring fiscal reasons why. I’m asking the deeper why, as in, “Was it worth the savings? Is Microsoft better now that we’ve cut these costs?”) ...

Country Specific Consent Requirements for Photographing People

I’m an aspiring photographer , and I love taking photos of moments of life on streets. Since I take my camera wherever I go, it was important to me to know the legal requirements of taking and publishing of such photos. I recently came across this nice table on Wikimedia, which categorizes countries on if a consent is required to take photos of identifiable people, and to publish them, and to use them commercially. ...

Control a smart light with multiple motion sensors in Home Assistant

I’ve been using Home Assistant to control many things at my smart home, including a couple of lights which are toggled on and off with motion sensors. I have one of these in the bathroom, so the motion sensor detects when I enter the bathroom and its light turns on, and the light turns back off if the motion sensor doesn’t detect any motion in a specific period (60 seconds). Setting this automation is very easy using the motion-activated light blueprint , you just need to choose the entities for the motion sensor and the light, and Home Assistant handles all the hassle of figuring out the logic for when to turn the light on or off, and what to do when a motion is detected in the cool off period. ...

Upstream Productivity

I like this emphasis on the impact of health on productivity: Biological health is upstream of mental health and mental health is upstream of productivity. Adding to this, I believe engaging in physical activities and working on improving one’s fitness has tremendous effect on productivity in long term. I play tennis regularly, the fact that I make progress in the court and I improve and become a better amateur player increases my confidence in me. I can later use that boost of confidence in my work and personal life. ...

Evidence That You Should Take Your Content Diet More Seriously

Interesting point about various types of content a creator can focus on: A surprising learning is that content funnels do not work. In theory, you produce entertaining, shortform content to get attention. Then once people are hooked they start consuming your deeper, educational, longform content. Except that this doesn’t happen. People who consume entertaining, shortform content are not interested in longform content. They just want more entertaining, shortform content. source ...