Why We Built Our Own Note-Taking Tool

The problem with notes

My notes are everywhere.

Class notes in Notion. Meeting minutes in a Feishu (飞书) doc that's technically owned by my company. Voice memos on my phone from a Tencent Meeting (腾讯会议) call. A paper notebook for the stuff I think better on paper. A half-filled Obsidian vault I keep meaning to get back to.

None of it is in one place. None of it connects. And when I need to reference what someone said two weeks ago in a project kickoff, I'm digging through four apps and still not finding it.

The problem is how these tools are built: each one is locked to its own ecosystem. Feishu only records Feishu meetings. Tencent Meeting only processes Tencent Meeting. Granola only works on Mac. None of them handle mixed Chinese-English conversations well. If you're switching between 3–5 platforms a day — which most Chinese professionals are — your notes stay fragmented by default.

The existing tools miss something

Granola is the closest thing to what we wanted. It listens to your meeting and produces AI notes afterward. The experience is genuinely good — no bots, no interruptions, just background recording.

But the transcript is the primary input. Your own notes are an afterthought. The output is a summary of what was said, not a document that reflects how you think.

A transcript summary is only as good as what was said out loud. But meetings don't capture everything — context lives in the preparation notes, the pre-read, the decision criteria you'd written down before the call. Granola has no way to incorporate any of that.

And Chinese support is weak. Mixed Chinese-English meetings — where you switch mid-sentence — degrade the experience significantly. Formulas don't render. Slash commands break Chinese input. The editor assumes English-only workflows.

A different philosophy

InkWeave starts from the opposite direction: your notes are the primary input, the transcript fills in what you missed.

Granola:   transcript → AI summary
InkWeave:  your notes + transcript → AI synthesis → structured document

The AI connects what was said to what you were already thinking. The output reflects how you reason, not just what happened in the room.

InkWeave works in three modes:

Notes only — paste rough notes, get a structured document (no audio needed)
Audio only — upload a recording, transcript-only mode
Notes + audio — highest quality output, the mode it was designed for

Building it together

InkWeave started with a late-night phone call. I was in a low mood — the kind of night where you call a friend without a specific reason. Jialong and I ended up throwing around half-formed ideas the way you do when you're tired but still thinking. At some point the idea clicked just enough for us to say, "okay, let's actually try this." We started building the next day.

Two weeks later, we were pushing commits back and forth at a pace that honestly surprised both of us. Jialong came in with almost no coding background. He didn't carry the same assumptions I'd picked up from classes and past projects — sometimes that meant slower starts, but other times he'd go in a direction I wouldn't even consider. What surprised me was how much I learned just from watching him explore.

One thing became clear pretty quickly: the bottleneck wasn't the system. The AI could process notes, generate structure, and synthesize content almost instantly. What slowed us down was something much more human — figuring out what to ask. Writing clear prompts, structuring inputs, deciding what actually matters. The faster the system got, the more obvious our own thinking speed became as the limiting factor.

The shared exploration ended up being just as important as the product itself.

What we built

The editor is Obsidian-style: Markdown-first, with KaTeX math rendering and Mermaid diagram blocks. No formatting bugs on Chinese input. Image paste works. It's a real editor, not a text box.

For audio, there are two paths. Upload a file and it runs through Whisper (with a local faster-whisper backend for Chinese accuracy). Or record directly in the browser with live transcription and speaker diarization — so the output knows who said what.

After transcription, you pick a template and hit Generate. Claude reads both your notes and the transcript together, then produces a structured output matching the template: meeting minutes, class notes, interview evaluation, advisor discussion, project summary, or a custom schema you define.

The templates are Claude system prompts, so adding a new one is just writing a prompt. Users can define their own.

What was hard to build

Bilingual ASR was harder than expected. Whisper handles Chinese well in isolation, but real meetings mix languages mid-sentence. We evaluated Xunfei (讯飞) ASR and Alibaba ASR alongside Whisper. Faster-whisper with a tuned model ended up winning for accuracy-per-cost on real mixed recordings.

The editor-transcript split was a UX problem more than a technical one. How do you show two inputs — notes and transcript — without making the interface feel like two separate tools? The current answer: the editor is primary, the transcript lives in a collapsible panel. Generate connects them invisibly.

Template design took more iteration than we expected. "Meeting notes" sounds simple until you have to define what structure actually helps people remember and act on what happened. The best templates turned out to be opinionated: they extract decisions, action items, and open questions explicitly, rather than just summarizing sections.

What's next

The web app is deployed and running. The next phase is a desktop app (Tauri) that captures system audio silently — no upload, no recording button, just open Tencent Meeting or Feishu and InkWeave is already listening.

That's the version where you end a meeting, switch to InkWeave, and your structured notes are already there.

It's live at inkweave.alexshen.dev if you want to try it. Free to use.