▲I let LLMs write an Elixir NIF in C; it mostly workedoverbring.com

68 points by overbring_labs 16 hours ago | 55 comments

flax 16 hours ago [-]

"it mostly worked" is just a more nuanced way of saying "it didn't work". Apparently the author did eventually get something working, but it is false to say that the LLMs produced a working project.

overbring_labs 12 hours ago [-]

What is your definition of "a working project"? It does what it says on the tin (actually it probably does more, because splint throws some warnings...)

jgalt212 15 hours ago [-]

I dunno. Depending on the writer and their particularly axe to grind the definition can vary widely. I would like it to mean, "any fixes I needed to make were minimal and not time intensive."

overbring_labs 12 hours ago [-]

It's more of "yeah it worked, but I had to do a lot of hand-holding" and "it passes the tests but I cannot tell if the code has memory leaks".

Actually, I can tell; I ran split on the C source and got things like this:

disk_space.c:144:16: Only storage bin.ref_bin (type void *) derived from variable declared in this scope is not released (memory leak)

So I'm looking into a Rust version with Rustler now.

Zxian 7 hours ago [-]

I'd also recommend having a look at the Zig library for Elixir. For simple stuff, you can just inline, but support for pretty complex setups too.

Also comes with a built-in BEAM allocator so the runtime properly reports NIF memory consumption.

https://hexdocs.pm/zigler/Zig.html

overbring_labs 21 minutes ago [-]

Thanks, one day I'll look into Zig too, because I've been impressed with what Tigerbeetle has achieved with it! For now, I'm all-in on Elixir.

brokencode 13 hours ago [-]

Ok. But what are you even reacting to? Who is saying that it produced a working product?

As you said, the very title of the article acknowledged that it didn’t produce a working product.

This is just outrage for the sake of outrage.

h4ny 4 hours ago [-]

> As you said, the very title of the article acknowledged that it didn’t produce a working product.

Then why not say "mostly didn't work"? I read the article and that's the impression I got.

The OP's comment isn't an outage, it's more like you intentionally painted it as an outrage with a comment that reads more like an outrage.

overbring_labs 11 hours ago [-]

Amen, thank you for noticing. The goal here was not to produce something of stellar quality, which is anyway out of the question as I don't have the skills/knowledge to evaluate anything other than "it returns the Elixir map I wanted". It was to see if this is feasible at all.

15 hours ago [-]

drumnerd 14 hours ago [-]

I would never ever let an LLM anywhere near C code. If you need help from LLM to write a NIF that performs basic C calls to the OS, you probably can’t check if it’s safe. I mean, it needs at least to pass valgrind.

simonw 12 hours ago [-]

You can use something like Claude Code or Codex CLI and tell it to run valgrind as part of iterating on the code.

true_religion 14 hours ago [-]

Security is a spectrum. If you totally control the input going into a program, it can be safe even if you didn't test it for memory leaks. The only errors that occur will be truly erroneous, not malicious and for many solutions that's fine.

At the very least, it's fine for personal projects which is something I'm getting into more and more: remembering that computers were meant to create convenience, so writing small programs to make life easier.

Alive-in-2025 12 hours ago [-]

For personal projects, ok security is different. But get out of that, and I'd do it even for that, you need defense in depth. You think you sanitized your input but your C program has a bug and a vulnerability - or your Java program or whatever has bugs. Almost everything has some bugs, and thus your vulnerabilities will hit eventually in your C program, even if you were careful.

I'd say absent some temporary hack to do something, my bad experiences won't let me say something is low risk. I worked at Microsoft years ago, and after the zillions of vulnerabilities were attacked by people around the time of windows 95 and computers on the net, we did serious code reviews in my team of the data access libraries. There were vast numbers of vulnerabilities. A group of 3 or 4 of us would sit in a room for 3 hours a day, one person a scribe, and we'd go over this c code that was ancient even then - we found problems everywhere, it was exhausting and shocking. The entire data access infrastructure was riddled with memory leaks, strings that were not length limited, input parameters that were not checked or sanitized, etc. I'm sure it was endemic across all components, not just there. We fixed some things, but we found so much shit.

Thank got I wasn't on the team trying to figure out what to do about those problems. I think they end of lifed a lot of stuff.

Muromec 8 hours ago [-]

>The entire data access infrastructure was riddled with memory leaks, strings that were not length limited, input parameters that were not checked or sanitized, etc. I'm sure it was endemic across all components, not just there. We fixed some things, but we found so much shit.

Sounds like the original vibe coding.

SOLAR_FIELDS 11 hours ago [-]

I mean, what I hear from that is that an LLM who you tell to write as safe code as possible is probably going to do a better job than your average human engineer at it, and you still have to do the same verification work either way. So why not have the LLM write the code and you instead just spend time verifying it? In other words, if I give an LLM and an average C developer the same task who will perform better? Even if the average C developer does better but takes N hours to write it and I still have to spend M cycles reviewing the average C developer's work, I'd rather have N be written by a machine since I have to pay M anyway regardless of whether it came from a machine or human.

rs186 11 hours ago [-]

Outside personal projects, my take is that security really just comes in two flavors: CVE vs no CVE. I pick the former.

Ygg2 13 hours ago [-]

> Security is a spectrum.

It's less spectrum and more that it's relative. Depends on attacker and what they seek to gain.

An unsecured server is an unsecured server. But there is a world of difference if they are attacked by CIA or local script kiddies.

overbring_labs 12 hours ago [-]

I mean, you aren't wrong. I'm looking into converting it into Rust with Rustler right now.

abrookewood 8 hours ago [-]

Yep, was wondering why you didn't go down that path in the first place. Seems way safer.

leansensei 4 hours ago [-]

It's because the most familiar thing to me that uses a NIF was Exqlite, so that was my starting point.

Using Rust and Rustler turned out to be way easier and it also now works across Elixir versions 1.14 to 1.18.

abrookewood 50 minutes ago [-]

Great outcome :)

lawik 14 hours ago [-]

I've done this. The NIF worked as in that it ran and was a correct enough NIF. It did not work in terms of solving what I needed it to do. Iteration was a bit painful because it was tangled with a nasty library that needed to be cross-compiled. So when I made a change it seg faulted and I bailed.

I essentially ran out of patience and tried another approach. It involved an LLM running C code so I could check the library output compared to my implementation to make sure it was byte-for-byte.

The C will never ship. I don't have practice writing C so I am very inefficient at it. I read it okay. LLMs are pretty decent help for this type of scrap code.

jsight 7 hours ago [-]

I once wrote a little generalized yaml templating processor in Python by using an LLM for assistance. It was working pretty well and passing a lot of the tests that I was throwing at it!

Then I noticed that some of the tests that failed were failing in really odd ways. Upon closer inspection, the generated processor had made lots of crazy assumptions about what it should be doing based upon specific values in yaml keys that were obviously unrelated to instructions.

Yeah, I agree with the author. This stuff can be incredibly useful, but it definitely isn't anything like an AGI in its current form.

simonw 15 hours ago [-]

For anyone wondering, the article clarifies that "A NIF is a function that is implemented in C instead of Erlang".

I had a bunch of fun getting ChatGPT Code Interpreter to write (and compile and test) C extensions for SQLite last year: https://simonwillison.net/2024/Mar/23/building-c-extensions-...

victorbjorklund 15 hours ago [-]

Not only C. Can be done in any compiled language (C, Rust, Zig, etc). Not sure if can be done with GC language.

toast0 14 hours ago [-]

BEAM loads a shared object, that opens the door to anything.

If you want to use a GC language for NIFs, you'd need to hook up your runtime somehow.

IMHO, it makes more sense to lean into the BEAM and use its resource management... my NIFs have all been pretty straight forward to write. All the boiler plate is what it is, and figuring out how to cooperate with the scheuduler for long running code or i/o can be a bit tricky, but if you can do a lot in a BEAM language, the native code ends up being like

Check the arguments, do the thing, return the results.

15 hours ago [-]

cultofmetatron 15 hours ago [-]

built my startup in elixir. love it but nifs are one of the few ways you can crash the VM. I don't trust myseld to write a nif in production. no way I'd do it with AI in c. Thank god theres projects like rustler which can catch panics before it crashes the main VM.

bcardarella 16 hours ago [-]

I tried to do this a few weeks ago, I tried to build a NIF around an existing C lib. I was using Claude Opus and burned over $300 (I didn't have Pro) on tokens with no usable results.

cpursley 16 hours ago [-]

Get Pro, 4 is quite good at Elixir now but you have to stay on it. 3.5 was not, so I imagine next version of Claude will be able to handle the more esoteric things like NIFs, etc.

bcardarella 13 hours ago [-]

The issue in this case was Opus was pretty crap at C. It kept introducing segfaults.

ipaddr 15 hours ago [-]

Get Pro 5.. it will work I promise

cpursley 15 hours ago [-]

I've completely refactored my Elixir codebase with Claude 4, expanded the test suite by 1,000 more tests, and released a few more features faster than I ever have to actual paying customers. Tidewave MCP is helpful as are some custom commands and a well tunded CLAUDE.md But you do you.

jtbayly 15 hours ago [-]

Would you be willing to share your CLAUDE.md file contents? I’m vibe coding in Elixir.

cpursley 14 hours ago [-]

I somewhat followed this:

https://elixirforum.com/t/coding-with-llms-conventions-md-fo...

It's not perfect - you often have to remind it not to write imperative style code and to lean on Elixir conventions like "with" statements, function head matching, not reassigning vars, etc.

cschep 15 hours ago [-]

So hard to tell if this is parody or not.

cpursley 14 hours ago [-]

Here's one Claude-vibed project that makes me money that I run in addition to my saas, which is Elixir. I'm not strong in TypeScript and this is an Astro static site, so Claude has been really helpful. Backend is Supabase (postgres) and a few background jobs via https://pgflow.dev (pgmq) that fetch and populate job openings and uses some AI steps to filter then classify into the correct categories (among other things, there's also an job application flow and automated email newsletter): https://jobsinappraisal.com

I also "vibed" up this: https://livefilter.fly.dev/todos (https://github.com/cpursley/livefilter) and this: https://star-support-demo.vercel.app/en/getting-started (https://github.com/agoodway/star-support-demo)

I hand wrote very little of this code, but can read most of it - the SQL and Elixir at least ;)

cess11 14 hours ago [-]

Is there a reason why you're using 'when is_struct/2' instead of pattern matching here?

https://github.com/cpursley/livefilter/blob/main/lib/live_fi...

napsterbr 12 hours ago [-]

This is clearly low quality, non-idiomatic AI-generated Elixir code. So the likely answer is that "you" did not use this at all; AI did.

I review this kind of AI-generated Elixir code on a daily basis. And it makes me want to go back to ~2022, when code in pull requests actually made sense.

Apologies for the rant, this is just a burnt out developer tired of reviewing this kind of code.

PS: companies should definitely highlight "No low-quality AI code" in job listings as a valid perk.

cpursley 12 hours ago [-]

Fwiw, the date range part of this is the lowest quality, I even have an issue open: https://github.com/cpursley/livefilter/issues/2

In production code I'd do a couple passes and tell it to lean into more function head and guard matching, etc.

But it does compiles, and works: https://livefilter.fly.dev/todos?filters%5Bassigned_to%5D%5B...

Sinidir 14 hours ago [-]

Its not really hard to tell.

weatherlight 15 hours ago [-]

Why C instead of Rust or Zig? Rustler and Zigler exist. I feel like a Vibecoded NIF in C is the absolute last thing I would want to expose the BEAM to.

overbring_labs 11 hours ago [-]

Given the amount of issues the code had when I ran splint on the C file, I agree. The question was for me whether I can get something working to get over the "speed bump" of lacking such a function for the API client I'm writing.

I'm now re-vibe-coding it into Rust with the same process, but also using Grok 4 to get better results. It now builds and passes the tests on Elixir 1.14 to 1.18 on macOS and Ubuntu, but I'm still trying to get Grok 3 and 4 to fix the Windows-specific parts of the Rust code.

qualeed 13 hours ago [-]

Why does every post that mentions something other than Rust or Zig get a comment saying "Why not Rust or Zig"?

sodapopcan 13 hours ago [-]

Why did you write your comment in English? Why not Rust or Zig?

leansensei 13 hours ago [-]

Why not C? It made no difference, we're talking about a few function calls.

weatherlight 10 hours ago [-]

because the author self admitted they don't know C! One of the reason why people use the Beam VM is because its robust and fault tolerant.

a lot of the choice here are made at the expense of VM's health.

also why wouldn't anyone just use :disksup.get_disk_info/1. (Thats immediate) calling :disksup.get_disk_info/1 won’t mess with the scheduler in the way a custom NIF or a big blocking port might.

I see the above code/lib and just see reflags all over the place.

leansensei 9 hours ago [-]

The post explains why I don't want to use disksup. You have to start an extra application (os_mon) and configure disksup to update the starts more frequently than the default of every 30 minutes.

Do we really need to do all that instead of the equivalent of a df?

Agree about the C code, which is why the latest version (on GitHub, the HEAD, not yet released in Hex.pm) is now using Rust and Rustler.

ch4s3 13 hours ago [-]

The author was trying to learn about https://github.com/elixir-sqlite/exqlite which uses C.

SweetSoftPillow 11 hours ago [-]

It's interesting why the author used weaker models (like Grok 3 when 4 is available, and Gemini 2.5 Flash when Pro is), since the difference in coding quality between these models is significant, and results could be much better.

overbring_labs 11 hours ago [-]

It's just that Grok 3 is faster than 4, so I've set it by default. But point taken, I'll try out the newer ones now that I'm converting it to Rust.

overbring_labs 11 hours ago [-]

Holy moly, you weren't kidding. Grok 4 is so much better. Thanks!

15 hours ago [-]

wordofx 12 hours ago [-]

This was built copy pasting results from chats? Not using an ide or cli like Claude Code or Amp? Why such a manual process. This isn’t 2023…

overbring_labs 11 hours ago [-]

Because what difference would it make, given the bad quality of code?

Also, is Claude Code free to use?

The manual process has the upside that you get to see how the sausage is (badly) made. Otherwise, just YOLO it and put your trust in GenAI completely.

Furthermore, if there is the interim step of pushing to GitHub to trigger the build & test workflow and see if it works on something other than Linux, is the choice of Vibe-Coding IDE really the limiting factor in the entire process?

juped 5 hours ago [-]

So all this arose because you didn't read the docs and note that get_disk_info/1 immediately fetches the data when called? The every-30-minutes-by-default checks are for generating "disk usage is high" event conditions.

leansensei 4 hours ago [-]

Thanks, that was not clear to me from skimming the docs.

However, this NIF also returns more fields than the disksup function.

Loading comments...

flax 16 hours ago [-]

"it mostly worked" is just a more nuanced way of saying "it didn't work". Apparently the author did eventually get something working, but it is false to say that the LLMs produced a working project.

overbring_labs 12 hours ago [-]

What is your definition of "a working project"? It does what it says on the tin (actually it probably does more, because splint throws some warnings...)

jgalt212 15 hours ago [-]

I dunno. Depending on the writer and their particularly axe to grind the definition can vary widely. I would like it to mean, "any fixes I needed to make were minimal and not time intensive."

overbring_labs 12 hours ago [-]

It's more of "yeah it worked, but I had to do a lot of hand-holding" and "it passes the tests but I cannot tell if the code has memory leaks".

Actually, I can tell; I ran split on the C source and got things like this:

disk_space.c:144:16: Only storage bin.ref_bin (type void *) derived from variable declared in this scope is not released (memory leak)

So I'm looking into a Rust version with Rustler now.

Zxian 7 hours ago [-]

I'd also recommend having a look at the Zig library for Elixir. For simple stuff, you can just inline, but support for pretty complex setups too.

Also comes with a built-in BEAM allocator so the runtime properly reports NIF memory consumption.

https://hexdocs.pm/zigler/Zig.html

overbring_labs 21 minutes ago [-]

Thanks, one day I'll look into Zig too, because I've been impressed with what Tigerbeetle has achieved with it! For now, I'm all-in on Elixir.

brokencode 13 hours ago [-]

Ok. But what are you even reacting to? Who is saying that it produced a working product?

As you said, the very title of the article acknowledged that it didn’t produce a working product.

This is just outrage for the sake of outrage.

h4ny 4 hours ago [-]

> As you said, the very title of the article acknowledged that it didn’t produce a working product.

Then why not say "mostly didn't work"? I read the article and that's the impression I got.

The OP's comment isn't an outage, it's more like you intentionally painted it as an outrage with a comment that reads more like an outrage.

overbring_labs 11 hours ago [-]

15 hours ago [-]

drumnerd 14 hours ago [-]

simonw 12 hours ago [-]

You can use something like Claude Code or Codex CLI and tell it to run valgrind as part of iterating on the code.

true_religion 14 hours ago [-]

Alive-in-2025 12 hours ago [-]

Thank got I wasn't on the team trying to figure out what to do about those problems. I think they end of lifed a lot of stuff.

Muromec 8 hours ago [-]

Sounds like the original vibe coding.

SOLAR_FIELDS 11 hours ago [-]

rs186 11 hours ago [-]

Outside personal projects, my take is that security really just comes in two flavors: CVE vs no CVE. I pick the former.

Ygg2 13 hours ago [-]

> Security is a spectrum.

It's less spectrum and more that it's relative. Depends on attacker and what they seek to gain.

An unsecured server is an unsecured server. But there is a world of difference if they are attacked by CIA or local script kiddies.

overbring_labs 12 hours ago [-]

I mean, you aren't wrong. I'm looking into converting it into Rust with Rustler right now.

abrookewood 8 hours ago [-]

Yep, was wondering why you didn't go down that path in the first place. Seems way safer.

leansensei 4 hours ago [-]

It's because the most familiar thing to me that uses a NIF was Exqlite, so that was my starting point.

Using Rust and Rustler turned out to be way easier and it also now works across Elixir versions 1.14 to 1.18.

abrookewood 50 minutes ago [-]

Great outcome :)

lawik 14 hours ago [-]

I essentially ran out of patience and tried another approach. It involved an LLM running C code so I could check the library output compared to my implementation to make sure it was byte-for-byte.

The C will never ship. I don't have practice writing C so I am very inefficient at it. I read it okay. LLMs are pretty decent help for this type of scrap code.

jsight 7 hours ago [-]

I once wrote a little generalized yaml templating processor in Python by using an LLM for assistance. It was working pretty well and passing a lot of the tests that I was throwing at it!

Yeah, I agree with the author. This stuff can be incredibly useful, but it definitely isn't anything like an AGI in its current form.

simonw 15 hours ago [-]

For anyone wondering, the article clarifies that "A NIF is a function that is implemented in C instead of Erlang".

I had a bunch of fun getting ChatGPT Code Interpreter to write (and compile and test) C extensions for SQLite last year: https://simonwillison.net/2024/Mar/23/building-c-extensions-...

victorbjorklund 15 hours ago [-]

Not only C. Can be done in any compiled language (C, Rust, Zig, etc). Not sure if can be done with GC language.

toast0 14 hours ago [-]

BEAM loads a shared object, that opens the door to anything.

If you want to use a GC language for NIFs, you'd need to hook up your runtime somehow.

Check the arguments, do the thing, return the results.

15 hours ago [-]

cultofmetatron 15 hours ago [-]

bcardarella 16 hours ago [-]

I tried to do this a few weeks ago, I tried to build a NIF around an existing C lib. I was using Claude Opus and burned over $300 (I didn't have Pro) on tokens with no usable results.

cpursley 16 hours ago [-]

Get Pro, 4 is quite good at Elixir now but you have to stay on it. 3.5 was not, so I imagine next version of Claude will be able to handle the more esoteric things like NIFs, etc.

bcardarella 13 hours ago [-]

The issue in this case was Opus was pretty crap at C. It kept introducing segfaults.

ipaddr 15 hours ago [-]

Get Pro 5.. it will work I promise

cpursley 15 hours ago [-]

jtbayly 15 hours ago [-]

Would you be willing to share your CLAUDE.md file contents? I’m vibe coding in Elixir.

cpursley 14 hours ago [-]

I somewhat followed this:

https://elixirforum.com/t/coding-with-llms-conventions-md-fo...

It's not perfect - you often have to remind it not to write imperative style code and to lean on Elixir conventions like "with" statements, function head matching, not reassigning vars, etc.

cschep 15 hours ago [-]

So hard to tell if this is parody or not.

cpursley 14 hours ago [-]

I hand wrote very little of this code, but can read most of it - the SQL and Elixir at least ;)

cess11 14 hours ago [-]

Is there a reason why you're using 'when is_struct/2' instead of pattern matching here?

https://github.com/cpursley/livefilter/blob/main/lib/live_fi...

napsterbr 12 hours ago [-]

This is clearly low quality, non-idiomatic AI-generated Elixir code. So the likely answer is that "you" did not use this at all; AI did.

I review this kind of AI-generated Elixir code on a daily basis. And it makes me want to go back to ~2022, when code in pull requests actually made sense.

Apologies for the rant, this is just a burnt out developer tired of reviewing this kind of code.

PS: companies should definitely highlight "No low-quality AI code" in job listings as a valid perk.

cpursley 12 hours ago [-]

Fwiw, the date range part of this is the lowest quality, I even have an issue open: https://github.com/cpursley/livefilter/issues/2

In production code I'd do a couple passes and tell it to lean into more function head and guard matching, etc.

But it does compiles, and works: https://livefilter.fly.dev/todos?filters%5Bassigned_to%5D%5B...

Sinidir 14 hours ago [-]

Its not really hard to tell.

weatherlight 15 hours ago [-]

Why C instead of Rust or Zig? Rustler and Zigler exist. I feel like a Vibecoded NIF in C is the absolute last thing I would want to expose the BEAM to.

overbring_labs 11 hours ago [-]

qualeed 13 hours ago [-]

Why does every post that mentions something other than Rust or Zig get a comment saying "Why not Rust or Zig"?

sodapopcan 13 hours ago [-]

Why did you write your comment in English? Why not Rust or Zig?

leansensei 13 hours ago [-]

Why not C? It made no difference, we're talking about a few function calls.

weatherlight 10 hours ago [-]

because the author self admitted they don't know C! One of the reason why people use the Beam VM is because its robust and fault tolerant.

a lot of the choice here are made at the expense of VM's health.

also why wouldn't anyone just use :disksup.get_disk_info/1. (Thats immediate) calling :disksup.get_disk_info/1 won’t mess with the scheduler in the way a custom NIF or a big blocking port might.

I see the above code/lib and just see reflags all over the place.

leansensei 9 hours ago [-]

The post explains why I don't want to use disksup. You have to start an extra application (os_mon) and configure disksup to update the starts more frequently than the default of every 30 minutes.

Do we really need to do all that instead of the equivalent of a df?

Agree about the C code, which is why the latest version (on GitHub, the HEAD, not yet released in Hex.pm) is now using Rust and Rustler.

ch4s3 13 hours ago [-]

The author was trying to learn about https://github.com/elixir-sqlite/exqlite which uses C.

SweetSoftPillow 11 hours ago [-]

overbring_labs 11 hours ago [-]

It's just that Grok 3 is faster than 4, so I've set it by default. But point taken, I'll try out the newer ones now that I'm converting it to Rust.

overbring_labs 11 hours ago [-]

Holy moly, you weren't kidding. Grok 4 is so much better. Thanks!

15 hours ago [-]

wordofx 12 hours ago [-]

This was built copy pasting results from chats? Not using an ide or cli like Claude Code or Amp? Why such a manual process. This isn’t 2023…

overbring_labs 11 hours ago [-]

Because what difference would it make, given the bad quality of code?

Also, is Claude Code free to use?

The manual process has the upside that you get to see how the sausage is (badly) made. Otherwise, just YOLO it and put your trust in GenAI completely.

juped 5 hours ago [-]

leansensei 4 hours ago [-]

Thanks, that was not clear to me from skimming the docs.

However, this NIF also returns more fields than the disksup function.