Teaching LLMs how to solid model

319 points by wgpatrick 3 months ago

alnwlsn 3 months ago

The future: "and I want a 3mm hole in one side of the plate. No the other side. No, not like that, at the bottom. Now make it 10mm from the other hole. No the other hole. No, up not sideways. Wait, which way is up? Never mind, I'll do it myself."

I'm having trouble understanding why you would want to do this. A good interface between what I want and the model I will make is to draw a picture, not write an essay. This is already (more or less) how Solidworks operates. AI might be able to turn my napkin sketch into a model, but I would still need to draw something, and I'm not good at drawing.

The bottleneck continues to be having a good enough description to make what you want. I have serious doubts that even a skilled person will be able to do it efficiently with text alone. Some combo of drawing and point+click would be much better.

This would be useful for short enough tasks like "change all the #6-32 threads to M3" though. To do so without breaking the feature tree would be quite impressive.

abe_m 3 months ago

I think this is along the lines of the AI horseless carriage[1] topic that is also on the front page right now. You seem to be describing the current method as operated through an AI intermediary. I think the power in AI for CAD will be at a higher level than lines, faces and holes. It will be more along the lines of "make a bracket between these two parts". "Make this part bolt to that other part". "Attach this pump to this gear train" (where the AI determines the pump uses a SAE 4 bolt flange of a particular size and a splined connection, then adds the required features to the housing and shafts). I think it will operate on higher structures than current CAD typically works with, and I don't think it will be history tree and sketch based like Solidworks or Inventor. I suspect it will be more of a direct modelling approach. I also think integrating FEA to allow the AI to check its work will be part of it. When you tell it to make a bracket between two parts, it can check the weight of the two parts, and some environmental specification from a project definition, then auto-configure FEA to check the correct number of bolts, material thickness, etc. If it made the bracket from folded sheet steel, you could then tell it you want a cast aluminum bracket, and it could redo the work.
[1]https://news.ycombinator.com/item?id=43773813
- jillesvangurp 3 months ago
  
  It's also going to be about diagnosing issues. "This part broke right here, explain why and come up with a solution", "Evaluate the robustness of this solution", "Can I save some material and reduce the weight", etc.
  Those are the kind of high level questions that an LLM with a decent understanding of CAD and design might be able to deal with soon and it will help speed up expensive design iterations.
  A neat trick with current LLMs is to give them screenshots of web pages and ask some open questions about the design, information flow, etc. It will spot things that expert designers would comment on as well. It will point out things that are unclear, etc. You can go far beyond just micro managing incremental edits to some thing.
  Mostly the main limitation with LLMs is the imagination of the person using it. Ask the right questions and they get a lot more useful. Even some of the older models that maybe weren't that smart were actually quite useful.
  For giggles, I asked chatgpt to critique the design of HN. Not bad. https://chatgpt.com/share/6809df2b-fc00-800e-bb33-fe7d8c3611...
  
  wavefrontbakc 3 months ago
  
  I think the cost of mistakes is the major driving force behind where you can adopt tools like these. Generating a picture of a chair with five legs? No big deal. Generating supports for a bridge that'll collapse next week? Big problem.
  > It will point out things that are unclear, etc. You can go far beyond just micro managing incremental edits to some thing.
  When prompted an LLM will also point it out when it's perfectly clear. LLM is just text prediction, not magic
  
  ben_w 3 months ago
  
  > I think the cost of mistakes is the major driving force behind where you can adopt tools like these. Generating a picture of a chair with five legs? No big deal. Generating supports for a bridge that'll collapse next week? Big problem
  Yes, indeed.
  But:
  Why can LLMs generally write code that even compiles?
  While I wouldn't trust current setups, there's no obvious reason why even a mere LLM cannot be used to explore the design space when the output can be simulated to test its suitability as a solution — even in physical systems, this is already done with non-verbal genetic algorithms.
  > LLM is just text prediction, not magic
  "Sufficiently advanced technology is indistinguishable from magic".
  Saying "just text prediction" understates how big a deal that is.
  
  wavefrontbakc 3 months ago
  
  >While I wouldn't trust current setups, there's no obvious reason why even a mere LLM cannot be used to explore the design space when the output can be simulated to test its suitability as a solution
  Having to test every assertation sounds like a not particularly useful application, and the more variables there are the more it seems to be about throwing completely random things at the wall and hoping it works
  You should use a tool for it's purpose, relying on text prediction to predict clarity is like relying on teams icons being green to actual productivity; a very vague, incidentally sometimes coinciding factor.
  You could use text predictor for things that rely on "how would this sentence usually complete" and get right answers. But that is a very narrow field, I can mostly imagine entertainment benefiting a lot.
  You could misuse text predictor for things like "is this <symptom> alarming?" and get a response that is statistically likely in the training material, but could be completely inverse for the person asking, again having very high cost for failing to do what it was never meant to. You can often demonstrate the trap by re-rolling your answer for any question a couple times and seeing how the answer often varies mild-to-completely-reverse depending on whatever seed you land.
  
  ben_w 3 months ago
  
  > Having to test every assertation sounds like a not particularly useful application, and the more variables there are the more it seems to be about throwing completely random things at the wall and hoping it works
  That should be fully automated.
  Instead of anchoring on "how do I test what ChatGPT gives me?", think "Pretend I'm Ansys Inc.*, how would I build a platform that combines an LLM to figure out what to make in the first place from a user request, with all our existing suite of simulation systems, to design a product that not only actually meets the requirements of that user request, but also actually proves it will meet those requirements?"
  * Real company which does real sim software
  
  aredox 3 months ago
  
  >Saying "just text prediction" understates how big a deal that is.
  Here on HN we often see posts insisting on the importance of "first principles".
  Your embrace of "magic" - an unknown black box who does seemingly wonderful things that usually blow up to one's face and have a hidden cost - is the opposite of that.
  LLMs are just text prediction. That's what they are.
  >Why can LLMs generally write code that even compiles?
  Why can I copy-paste code and it compiles?
  Try to use LLM on code there is little training material about - for example PowerQuery or Excel - and you will see it bullshit and fail - even Microsoft's own LLM.
  
  ben_w 3 months ago
  
  > Why can I copy-paste code and it compiles?
  I think phrasing it like that is called "begging the question": you've already skipped past all the intelligence you had to apply to figure out which part of the entire internet constituted "code".
  And not just any code, but code in correct language. If I copy-paste C64 Basic into the middle of a .swift file (and not as a string), it isn't going to compile.
  And not just in the correct language, but a complete block of it, rather than a fragment.
  > even Microsoft's own LLM.
  "even" suggests you hold them in higher regard than I do.
  
  cdecl 3 months ago
  
  > LLMs are just text prediction. That's what they are.
  This sort of glib talking point really doesn't pass muster, because if you showed the current state of affairs to a random developer from 2015, you would absolutely blow their damned socks off.
  
  aredox 3 months ago
  
  They would be blown off by the "Unreasonable Effectiveness of [text prediction]", but it is still text prediction.
  That's the very root cause why we still have unsolved problems like the inability to get the same answer to the same questions, the inability to do riguorous maths or logic (any question that only has one good answer, in fact) and hallucinations!
  
  cdecl 3 months ago
  
  The problem is not the text prediction, which nobody denies, rather the "just" which minimizes its impact.
  
  abe_m 2 months ago
  
  I doubt that an LLM for CAD would be using the free-form generation that seems to lead to the 5-legged chair.
  I suspect it will be more of a tool calling layer that is building up the model based on the output of more deterministic tools. "I need this bearing mounted here" -> LLM reads the part number of the bearing, does a RAG search for the bearing properties to get sizes, looks up what type of fits should be used in the scenario, feeds that info into a function that generates the appropriate seat geometry. Rather than trying to morph triangles into sort-of-kind-of a cylindrical face that may or may have the 0.0XXmm interference that is the difference between success or failure of the fitment.
  
  sharemywin 3 months ago
  
  isn't it closer to concept prediction layered over top of text prediction because of the multiple levels? it compresses text into concepts using layers of embeddings and neural encoding then predicts the concept based on multiple areas of attention. then decompresses it to find the correct words to convey the concept.
  
  baq 3 months ago
  
  The text of every Nobel winning physics theory was predicted in someone’s head, too
  
  Nevermark 3 months ago
  
  > Mostly the main limitation with LLMs is the imagination of the person using it. Ask the right questions and they get a lot more useful.
  Completely agree.
  We get waves of comments on HN downplaying model abilities or their value.
  Many people don’t seem to explore and experiment with them enough. I have 3 screens. The left one has two models on it. The right one has a model & a web browser for quick searches. I work on the largest middle screen.
  Extreme maybe, but I use them constantly resulting in constant discovery of helpful new uses.
  I web search maybe 10% of what I did six months ago.
  The quirks are real, but the endless upsides models deliver when you try things were unobtainium, from humans or machines, until LLMs.
  
  wavefrontbakc 3 months ago
  
  I've only found useful answers for when I've begun something from completely zero. Things I've got even a reasonable grasp on I start to spot large dangerous holes a model very confidently claims is the right solution. Stuff like bringing camping stove inside a tent to avoid rainy weather, nevermind it being suicide at the same time.
  >I web search maybe 10% of what I did six months ago.
  Me too, though this is more driven by the total cliff-fall of web search result quality
  
  sharemywin 3 months ago
  
  I'm not sure what you used for it to tell you that?
  what should I do with my camping stove if I want to avoid the rain
  ChatGPT said: If you want to avoid the rain while using your camping stove, here are your best options:
  Shelter Options: Use a Tarp: Set up a tarp high above your stove area with good airflow. Make sure it's not enclosed and water runs off away from you.
  Camp Kitchen Tent: Lightweight pop-up canopies or dedicated camp kitchen shelters are perfect. Just make sure it’s ventilated.
  Picnic Table Shelter: Many campgrounds have covered areas you can cook under — check for those.
  Vestibule (Cautiously): If you have a tent with a large vestibule, you can use the stove there, only if it's very well-ventilated and you're extremely cautious (fire and carbon monoxide risks are real).
  Important Safety Tips: Never use a stove inside a closed tent — risk of carbon monoxide poisoning and fire.
  Keep the stove on a stable, flat surface.
  Avoid cooking right under trees with low-hanging branches.
  Ensure your shelter is tall enough — heat rises.
  Would you like a quick sketch or image of a tarp setup that works well in rain?
  
  anthonyrstevens 3 months ago
  
  I'm starting to think that many of the "But the AIs tell me I should drive my car off a cliff!!" posters are just making stuff up.
  
  ben_w 3 months ago
  
  I've seen enough weird output from some models to not think quite so negatively about nay-sayers.
  If "stupid response" happens 1% of the time, and the first attempt to use a model has four rounds of prompt-and-response, then I'd expect 1 in 25 people to anchor on them being extremely dumb and/or "just autocomplete on steroids" — the first time I tried a local model (IIRC it was Phi-2), I asked for a single page Tetris web app, which started off bad and half way in became a python machine learning script; the first time I used NotebookLM, I had it summarise one of my own blog posts and it missed half and made up clichés about half the rest.
  And driving off, if not a cliff then a collapsed bridge, has gotten in the news even with AI of the Dijkstra era: https://edition.cnn.com/2023/09/21/us/father-death-google-gp...
  
  xhkkffbf 3 months ago
  
  No! A friend of a friend asked an AI and the AI said they were real. Honest. But it was the other AIs. Not the one the friend asked.
  
  cdecl 3 months ago
  
  The problem I have with this conclusion is that "trust but verify" long predates AI models. People can, and have been, posting total bullshit on the internet since time immemorial. You have never _not_ needed to actually validate the things you are reading.
  
  dmd 3 months ago
  
  > Not bad
  It reads like a horoscope to me.
  
  krige 3 months ago
  
  > Not bad. I'm sorry but it's pretty bad. Hierarchy complaint is bogus, so's navigation overload, it hallucinates BG as white, and the rest is very generic.
  
  otabdeveloper4 3 months ago
  
  > wanting AI to make decisions
  That's a mega-yikes for me.
  Go ahead and do something stupid like that for CEO or CTO decisions, I don't care.
  But keep it out of industrial design, please. Lives are at stake.
- alnwlsn 3 months ago
  
  You're right, but I think we have a long way to go. Even our best CAD packages today don't work nearly as well as advertised. I dread to think what Dassault or Autodesk would charge per seat for something that could do the above!
  
  abe_m 3 months ago
  
  I agree. I think a major hindrance to the current pro CAD systems is being stuck to the feature history tree, and rather low level features. Considerable amounts of requirements data is just added to a drawing free-form without semantic machine-readable meaning. Lots of tolerancing, fit, GD&T, datums, etc are just lines in a PDF. There is the move to MBD/PMI and the NIST driven STEP digital thread, but the state of CAD is a long way from that being common. I think we need to get to the data being embedded in the model ala MBD/PMI, but then go beyond it. The definition of threads, gear or spline teeth, ORB and other hydraulic ports don't fit comfortably into the current system. There needs to be a higher level machine-readable capture, and I think that is where the LLMs may be able to step in.
  I suspect the next step will be such a departure that it won't be Siemens, Dassault, or Autodesk that do it.
- coderenegade 3 months ago
  
  I think this is correct, especially the part about how we actually do modelling. The topological naming problem is really born from the fact that we want to do operations on features that may no longer exist if we alter the tree at an earlier point. An AI model might find it easier to work directly with boolean operations or meshes, at which point, there is no topological naming problem.
seveibar 3 months ago

Most likely you won’t be asking for specific things like “3mm hole 3in from the side”, you’ll say things like “Create a plastic enclosure sized to go under a desk, ok add a usb receptacle opening, ok add flanges with standard screw holes”
In the text to CAD ecosystem we talk about matching our language/framework to “design intent” a lot. The ideal interface is usually higher level than people expect it to be.
- mediaman 3 months ago
  
  The problem is that this isn't very useful except for the very earliest ideation stages of industrial design, which hardly need CAD anyway.
  Most parts need to fit with something else, usually some set of components. Then there are considerations around draft, moldability, size of core pins, sliders, direction of ejection, wall thickness, coring out, radii, ribs for stiffness, tolerances...
  LLMs seem far off from being the right answer here. There is, however, lots to make more efficient. Maybe you could tokenize breps in some useful way and see if transformers could become competent speaking in brep tokens? It's hand-wavy but maybe there's something there.
  Mechanical engineers do not try to explain models to each other in English. They gather around Solidworks or send pictures to each other. It is incredibly hard to explain a model in English, and I don't see how a traditional LLM would be any better.
  
  esperent 3 months ago
  
  You may or may not be right, but your arguments sound like echos of what software developers were saying four or five years ago. And four or five years ago, they were right.
  Don't dismiss an AI tool just because the first iterations aren't useful, it'll be iterated on faster than you can believe possible.
  
  littlestymaar 3 months ago
  
  While LLMs are a useful tool for software development, if you try asking them the software equivalent of “Create a plastic enclosure sized to go under a desk, ok add a usb receptacle opening, ok add flanges with standard screw holes” you'll end up with the equivalent of “No the other side. No, not like that, at the bottom. Now make it 10mm from the other hole. No the other hole. No, up not sideways. Wait, which way is up? Never mind, I'll do it myself” a lot.
  What works is asking them to implement micro feature that you will specify well enough at first try, not to ask them writing the entire piece of software from top to bottom. The tech is clearly not there yet for the latter.
  The main difference between Code and CAD is that code is language you're writing to the machine to execute already, so it's pretty natural to just use a more abstract/natural language to ask it instead of the formal one of code, whereas CAD is a visual, almost physical task, and it's more pleasant to do a task than describe it in depth with words.
  
  esperent 3 months ago
  
  > Create a plastic enclosure sized to go under a desk, ok add a usb receptacle opening, ok add flanges with standard screw holes
  With vague specifications like these, you'd get garbage from a human too.
  What works for software, and I suspect for other technical fields like CAD too, is to treat it like a junior developer who has an extreme breadth of knowledge but not much depth. You will need to take care to clearly specify your requirements.
  
  littlestymaar 3 months ago
  
  > With vague specifications like these, you'd get garbage from a human too.
  You'll never have better input than this at the beginning of any project from the person that brings the use-case. That's a full job to help them define the needs more accurately. And if you always work with clear specifications it's just because there's someone in front of you that has helped write the spec starting from the loose business requirement.
  > You will need to take care to clearly specify your requirements
  Yes, but as I discussed above, for such tasks it's going to be very frustrating and less efficient than doing things by yourself. The only reason why you'd accept to go through this kind of effort for an intern is that because you expect him to learn and become autonomous at some point. With current tech, an LLM will forever remain as clueless as it started.
  
  esperent 3 months ago
  
  > You'll never have better input than this at the beginning of the project from the person that brings the use-case
  That's as may be, but again, it's not much different to being a software developer.
  Someone might ask you to create a website for their business. It's your job, as the expert, to use the available tools - including AI - to turn their requirements into working code. They might say "put a button for the shopping cart into the top right". It's then your job, as as the technical expert, to get that done. Even the laziest of devs wouldn't expect to just copy/paste that request into a AI tool and get a working result.
  It takes time to learn to use these tools.
  When I'm using AI to help me write code, depending on the complexity of what I'm working on, I generally write something very similar to what I'd write if I was asking other developers for help (although I can be much terser). I must specify the requirements very clearly and in technical language.
  Usually I keep a basic prompt for every project that outlines the technical details, the project requirements, the tools and libraries being used, and so on. It's exactly the same information I'd need to document for another human working on the project (or for myself a year later) so there's no wasted work.
  
  mediaman 3 months ago
  
  I've created (small, toy) transformers and also modeled injection molded parts in Solidworks.
  There is a really big difference. It's obvious how programming languages can use tokens for an attention mechanism, which gives them excellent ability to have parallelized performance (versus RNNs, the prior way of relating tokens) with much broader ability to maintain coherence.
  I don't know the parallel with brep. What are the tokens here? It's a fundamental architectural question.
  But unlike four or five years ago for programming, when the attention mechanism was clear with transformers and the answer was, basically, just "scale it up", we don't even really know where to begin here. So assuming some magic is going to happen is optimistic.
  It'd be exciting if we could figure it out. Maybe the answer is that we do away with brep and all tree based systems entirely (though I'm a little unclear how we maintain the mathematical precision of brep, especially with curves, which is necessary for machining -- your machinist is going to throw you out if you start giving them meshes, at least for precision work that has anything with a radius).
eurekin 3 months ago

I have come across a significant number of non engineers wanting to do, what ultimately involves some basic CAD modelling. Some can stall on such tasks for years (home renovation) or just don't do it at all. After some brief research, the main cause is not wanting to sink over 30 hours into learning basics of a cad package of choice.
For some reason they imagine it as a daunting, complicated, impenetrable task with many pitfalls, which aren't surmountable. Be it interface, general idea how it operates, fear of unknown details (tolerances, clearances).
It's easy to underestimate the knowledge required to use a cad productively.
One such anecdata near me are highschools that buy 3d printers and think pupils will naturally want to print models. After initial days of fascination they stopped being used at all. I've heard from a person close to the education that it's a country wide phenomena.
Back to the point though - maybe there's a group of users that want to create, but just can't do CAD at all and such text description seem perfect for them.
- Animats 3 months ago
  
  There's a mindset change needed to use a feature tree based constructive solid geometry system. The order in which you do things is implicit in the feature tree. Once you get this, it's not too hard. But figuring out where to start can be tough.
  I miss the TechShop days, from when the CEO of Autodesk liked the maker movement and supplied TechShop with full Autodesk Inventor. I learned to use it and liked it. You can still get Fusion 360, but it's not as good.
  The problem with free CAD systems is that they suffer from the classic open source disease - a terrible user interface. Often this is patched by making the interface scriptable or programmable or themeable, which doesn't help. 3D UI is really, really hard. You need to be able to do things such as change the viewpoint and zoom without losing the current selection set, using nothing but a mouse.
  (Inventor is overkill for most people. You get warnings such as "The two gears do not have a relatively prime number of teeth, which may cause uneven wear.")
- phkahler 3 months ago
  
  >> I have come across a significant number of non engineers wanting to do, what ultimately involves some basic CAD modelling.
  I very much want Solvespace to be the tool for those people. It's very easy to learn and do the basics. But some of the bugs still need to get fixed (failures tend to be big problems for new users because without experience its hard to explain what's going wrong or a workaround) and we need a darn chamfer and fillet tool.
  
  Animats 3 months ago
  
  > I very much want Solvespace to be the tool for those people.
  Probably not. "Copyright 2008-2022 SolveSpace contributors. Most recent update June 2 2022."
  
  phkahler 3 months ago
  
  Most recent commit to the repository: Last week. It's slow and has slowed the last couple years but it's still going.
itissid 3 months ago

> and I want a 3mm hole in one side of the plate. No the other side. No, not like that, at the bottom. Now make it 10mm from the other hole. No the other hole. No, up not sideways.
One thing that is interesting here is you can read faster than TTS to absorb info. But you can speak much faster than you can type. So is it all that typing that's the problem or could be just an interface problem? and in your example, you could also just draw with your hand(wrist sensor) + talk.
As I've been using agents to code this way. Its way faster.
- alnwlsn 3 months ago
  
  Feels a bit like being on a call with someone at the hardware store, about something that you both don't know the name for. Maybe the person on the other end is confused, or maybe you aren't describing it all that well. Isn't it easier to take a picture of the thing or just take the thing itself and show it to someone who works there? Harder again to do that when the thing you want isn't sold at the store, which is probably why you're modeling it in the first place.
  Most of the mechanical people I've met are good at talking with their hands. "take this thing like this, turn it like that, mount it like this, drill a hole here, look down there" and so on. We still don't have a good analog for this in computers. VR is the closest we have and it's still leagues behind the Human Hand mk. 1. Video is good too, but you have to put in a bit more attention to camerawork and lighting than taking a selfie.
- voidUpdate 3 months ago
  
  And I can think faster than I can speak, which means it's easier for me to think about what I need to do and then do it, rather than type or speak to an LLM so they can work out what I need to do instead
michaelt 3 months ago

> I'm having trouble understanding why you would want to do this.
You would be amazed at how much time CAD users spend using Propriety CAD Package A to redraw things from PDFs generated by Propriety CAD Package B
- plorg 3 months ago
  
  I spend a lot of time using proprietary CAD package A to redraw things from Who Knows What, but that's mostly because the proprietary CAD data my vendor would send me is trapped behind an NDA for which getting legal approval would take more time and effort than just modeling the things in front of me with minimum viable detail, or else my vendor is 2 businesses removed from the person with the CAD data that I need (and may require a different NDA that we can't sign without convincing our vendor to do the same). Anyone I've ever been able to request CAD data from will just send me STEP or parasolid files and they will work well enough for me to do my job. Often I spend more time removing model features so my computer will run a little faster.
whatshisface 3 months ago

Here's how it might work, by analogy to the workflow for image generation:
"An aerodynamically curved plastic enclosure for a form-over-function guitar amp."
Then you get something with the basic shapes and bevels in place, and adjust it in CAD to fit your actual design goals. Then,
"Given this shape, make it easy to injection mold."
Then it would smooth out some things a little too much, and you'd fix it in CAD. Then, finally,
"Making only very small changes and no changes at all to the surfaces I've marked as mounting-related in CAD, unify my additions visually with the overall design of the curved shell."
Then you'd have to fix a couple other things, and you'd be finished.
- zonkerdonker 3 months ago
  
  I appreciate that you have given this some thought, but it is clear that you dont have much or any professional experience in 3D modeling or mechanical design.
  For the guitar amp, ok. Maybe that prompt will give you a set of surfaces you can scale for the exterior shell of the amp. Because you will need to scale it, or know exactly the dimensions of your speakers, internal chambers, electronics, I/O, baffles, and where those will all ve relative go eachother. Also...Do you need buttons? Jacks/connectors/other I/O? How and where will the connections be routed to other components? Do you need an internal structure with an external aesthetic shell? Or are you going to somehow mold the whole thing in one piece? Where should the part be split? What kind of fasteners will join the parts and where should they be joined? What material is the shell? Can it be thinner to save weight? Or need ribs or thickness for strength? Where does it need to be strong?
  These are the issues from 30 seconds of thinking about this. AI (as suggested) could maybe save me from surfacing an exterior cosmetic cover, given presice constraints and dimensions, but at that point, I may as well just do it myself.
  If you have a common, easy, already solved an mechanical design problem (hinge e.g.), then you buy an off the shelf component. For everything else, it is bespoke, and every detail matters. Every problem is a "wine glass full to the brim"
  
  throwup238 3 months ago
  
  I think you’re jumping too fast to the “vibe CADing” extreme. It’s been a while since I’ve used Solidworks in anger so I’d rather use ECAD as an example: I’d kill for the ability to give Altium a PDF datasheet and have it generate footprints or schematic components tailored to my specific pinout for a microcontroller. Or give it a pdf of routing guidelines and have it convert those to design rules tied just to those nets. Those are the details that take up most of the time (although I’d still spend quite a lot of tine verifying all the output).
  In MCAD it’s less of a problem because all the big vendors like Misumi, McMaster, et al have extensions or downloadable models but anything custom could probably benefit from LLMs (I say this as someone who is generally skeptical of their vision capabilities). I don’t think vibe CADing will work because most parts are too parametrized but giving an AI a bunch of PDFs and a size + thickness is probably going to be really productive.
- tylergetsay 3 months ago
  
  In your example, what about mounting the electronics or specifying that the control knobs need to fit within these dimensions? I guess its easy if those objects are available as a model, but thats not always the case.. 3d scanner maybe?
  
  whatshisface 3 months ago
  
  You'd get control knobs of a reasonable size, and mounting holes in an arbitrary rectangle, then correct them with the true dimensions outside of generation.
ssl-3 3 months ago

So maybe the future is to draw a picture, and go from there?
For instance: My modelling abilities are limited. I can draw what I want, with measurements, but I am not a draftsman. I can also explain the concept, in conversational English, to a person who uses CAD regularly and they can hammer out a model in no time. This is a thing that I've done successfully in the past.
Could I just do it myself? Sure, eventually! But my modelling needs are very few and far between. It isn't something I need to do every day, or even every year. It would take me longer to learn the workflow and toolsets of [insert CAD system here] than to just earn some money doing something that I'm already good at and pay someone else to do the CAD work.
Except maybe in the future, perhaps I will be able use the bot to help bridge the gap between a napkin sketch of a widget and a digital model of that same widget. (Maybe like Scotty tried to do with the mouse in Star Trek IV.)
(And before anyone says it: I'm not really particularly interested in becoming proficient at CAD. I know I can learn it, but I just don't want to. It has never been my goal to become proficient at every trade under the sun and there are other skills that I'd rather focus on learning and maintaining instead. And that's OK -- there's lots of other things in life that I will probably also never seek to be proficient at, too.)
wgpatrick 3 months ago

Yeah - I fully agree with this POV. From a UX/UI POV, I think this is where things are headed. I talk about this a bit at the end of the piece.
chpatrick 3 months ago

If the napkin sketch generation is 95% right and only needs minor corrections then it's still a massive time saver.
wilg 3 months ago

Think about it this way, if the richest person in the world wanted something done, they would probably just shoot off a text to someone, maybe answer a few questions, and then some time later it would be done. That's your interface.
bboygravity 3 months ago

Text (specs + conversations) is the starting point of 100 percent of all CAD drawings made by more than 1 human though (so essentially everything you see around you).
I don't get your point (and yes I use CAD programs myself).
- alnwlsn 3 months ago
  
  Most of the things I work on are "make this fit onto that like this". There's not that much information contained in that sentence, it's contained in the 2 objects I was just handed. Sometimes we have drawings for that stuff, most of the time we don't. The "like this" part can be a conversation, but the rest of the info is missing and needs to be recreated.
  I said this below, but most of the mechanical people I've met are good at talking with their hands. "take this thing like this, turn it like that, mount it like this, drill a hole here, look down there" and so on. We still don't have a good analog for this in computers. VR is the closest we have and it's still leagues behind the Human Hand mk. 1. Video is good too, but you have to put in a bit more attention to camerawork and lighting than taking a selfie.
fragmede 3 months ago

talking to my computer and having it create things is pretty danged cool. Voice input takes out so much friction that, yeah, maybe it would be faster with a mouse and keyboard, but if I can just talk with my computer? I can do it while I'm walking around and thinking.
oofbaroomf 3 months ago

If you would use LLM-assisted CAD for real industrial design, you would have to end up by specifying exactly where everything has to go and what size it has to be. But if you are doing that then you may as well make an automated program to convert those specific requirements into a 3D model.
Oh wait, that's CAD.
Cynical take aside, I think this could be quite useful for normal people making simple stuff, and could really help consumer 3D printing have a much larger impact.
lud_lite 3 months ago

If there I in AI then it would tell me where to put the hole and why.
farts_mckensy 3 months ago

[flagged]

spmcl 3 months ago

I did this a few months ago to make a Christmas ornament. There are some rough edges with the process, but for hobby 3D printing, current LLMs with OpenSCAD is a game-changer. I hadn't touched my 3D printer for years until this project.

https://seanmcloughl.in/3d-modeling-with-llms-as-a-cad-luddi...

dgacmu 3 months ago

This matches my experience having Claude 3.5 and Gemini 2.0-flash generate openSCAD, but I would call it interesting instead of a game changer.
It gets pretty confused about the rotation of some things and generally needs manual fixing. But it kind of gets the big picture sort of right. It mmmmayybe saved me time the last time I used it but I'm not sure. Fun experiment though.
0_____0 3 months ago

As a MCAD user this makes me feel more confident that my skills are safe for a bit longer. The geometry you were trying to generate (minus bayonet lock, which is actually a tricky thing to make because it relies on elastic properties of the material) takes maybe a few minutes to build in Solidworks or any modern CAD package.

adamweld 3 months ago

A recent Ezra Klein Interview[0] mentioned some "AI-Enabled" CAD tools used in China. Does anyone know what tools they might be talking about? I haven't been able to find any open-source tools with similar claims.

>I went with my colleague Keith Bradsher to Zeekr, one of China’s new car companies. We went into the design lab and watched the designer doing a 3D model of one of their new cars, putting it in different contexts — desert, rainforest, beach, different weather conditions.

>And we asked him what software he was using. We thought it was just some traditional CAD design. He said: It’s an open-source A.I. 3D design tool. He said what used to take him three months he now does in three hours.

[0] https://www.nytimes.com/2025/04/15/opinion/ezra-klein-podcas...

sota_pop 3 months ago

Sounds like he could have been using an implementation of stable-diffusion+control-net. I’ve used Automatic1111, but I understand comfyUI and somethingsomethingforge are more modern versions.
throwaway314155 3 months ago

Happy to be corrected but this sounds like the kind of bullshit that crops up from time to time confusing "old" AI with generative AI.
Not that I don't believe it's possible. I just think the alternative (that it's bullshit) is more likely.

ariwilson 3 months ago

I'm a great user for this problem as I just got a 3D printer and I'm no good at modeling. I'm doing tutorials and printing a few things with TinkerCAD now, but my historic visualization sense is not great. I used SketchUp when I had a working Oculus Quest which was very cool but not sure how practical it is.

Unfortunately I tried to generate OpenSCAD a few times to make more complex things and it hasn't been a great experience. I just tried o3 with the prompt "create a cool case for a Pixel 6 Pro in openscad" and, even after a few attempts at fixing, still had a bunch of non-working parts with e.g. the USB-C port in the wrong place, missing or incorrect speaker holes, a design motif for the case not connected to the case, etc.

It reminds me of ChatGPT in late 2022 when it could generate code that worked for simple cases but anything mildly subtle it would randomly mess up. Maybe someone needs to finetune one of the more advanced models on some data / screenshots from Thingiverse or MakerWorld?

_mattb 3 months ago

Really cool, I'd love to try something like this for quick and simple enclosures. Right now I have some prototype electronics hot glued to a piece of plywood. It would be awesome to give a GenCAD workflow the existing part STLs (if they exist) and have it roughly arrange everything and then create the 3D model for a case.

Maybe there could be a mating/assembly eval in the future that would work towards that?

rowanG077 3 months ago

About a year ago I had a 2D drawing of a relatively simple, I uploaded it to chatgpt and asked it to model it in cadquery. It required some coaching and manual post processing but it was able to do it. I have since moved to solvespace since even after using cadquery for years I was spending 50% of the time finding some weird structure to continue my drawing from. Solvespace is simply much more productive for me.

niemandhier 3 months ago

This reminds me of using llms for LaTex.

They will get you to 80% fast, The last 20% to match what is in your head are hard.

If you never walked the long path you you probably won’t manage to go the last few steps.

fxtentacle 3 months ago

Wow, this entire thing reads like a huge "stay away" sign to me.

The call to action at the end is: "Try out Text-to-CAD in our Modeling App" But that's like the last thing I want to do. Even when I'm working with very experienced professionals, it's really hard to tell them what exactly I want to see changed in their 3D CAD design. That's why they usually export lots of 2D drawings and then I will use a pencil to draw on top of it and then they will manually update the 3D shape to match my drawn request. The improvement that I would like to see in affordable CAD software is that they make it easier to generate section views and ideally the software would be able to back-propagate changes from 2D into the 3D shape. Maybe one day that will be possible with multimodal AI models, but even then the true improvement is going to be in the methods that the AI model uses internally to update the data. But trying to use text? That's like bringing a knife to a gunfight. It's obviously the wrong modality for humans to reason about shapes.

Also, as a more general comment, I am not sure that it is possible to introduce a new CAD tool with only subscription pricing. Typically, an enclosure design will go through multiple variations over multiple production runs in multiple years. That means it's obvious to everyone that you need your CAD software to continue working for years into the future. For a behemoth like Autodesk, that is believable. For a startup with a six month burn rate, it's not. That's why people treat startups with subscription pricing like vaporware.

acyou 3 months ago

I think if you could directly tokenize 3D geometry and train an LLM on 3D models directly, you might get somewhere. In order to prompt it, you would need to feed it a 3D model(s), and prompts and it could give you back a different 3D model. This has been done to some extent with generative modeling pre-LLM, but I don't know of any work that takes LLM techniques applied to language and applies them to "tokenizing" 3D geometry. I suspect NVIDIA is probably working very hard on this exact problem for graphics applications.

For mechanical design, 3D modeling is highly integrative, inputs are from a vast array of poorly specified inputs with a high amount of unspecified and fluid contextual knowledge, and outputs are not well defined either. I'm not convinced that mechanical design is particularly well suited to pairing with LLM workflow. Certain aspects, sure. But 3D models and drawings that we consider "well-defined" are still usually quite poorly defined, and from necessity rely heavily on implicit assumptions.

The geometry of machine threads, for example. Are you going to have a big computer specify the position of each of the atoms in the machine thread? Even the most detailed CAD/CAM packages have thread geometry extremely loosely defined, to the point of listing the callout, and not modeling any geometry at all in many cases.

It would just be very difficult to feed enough contextual knowledge into an LLM to have the knowledge it needs to do mechanical design. Therein lies the main problem. And I will stress that it's not a training problem, it's a prompt problem, if that makes sense.

alexose 3 months ago

As a huge OpenSCAD fan and everyday Cursor user, it seems obvious to me that there's a huge opportunity _if_ we can improve the baseline OpenSCAD code quality.

If the model could plan ahead well, set up good functions, pull from standard libraries, etc., it would be instantly better than most humans.

If it had a sense of real-world applications, physics, etc., well, it would be superhuman.

Is anyone working on this right now? If so I'd love to contribute.

switchbak 3 months ago

OpenSCAD has some fundamental issues with which folks are well aware. Build123d is a Python alternative that shows promise and seems more capable, and there's others around.
Hard to beat the mindshare of OpenSCAD at the moment though.

conorbergin 3 months ago

Your prompts are very long for how simple the models are, using a CAD package would be far more productive.

I can see AI being used to generate geometry, but not a text based one, it would have to be able to reason with 3d forms and do differential geometry.

You might be able to get somewhere by training an LLM to make models with a DSL for Open Cascade, or any other sufficiently powerful modelling kernel. Then you could train the AI to make query based commands, such as:

  // places a threaded hole at every corner of the top surface (maybe this is an enclosure)
  CUT hole(10mm,m3,threaded) LOCATIONS surfaces().parallel(Z).first().inset(10).outside_corners()

This has a better chance of being robust as the LLM would just have to remember common patterns, rather than manually placing holes in 3d space, which is much harder.

wgpatrick 3 months ago

I definitely agree with your point about the long prompts.
The long prompts are primarily an artifact of trying to make an eval where there is a "correct" STL.
I think your broader point, text input is bad for CAD, is also correct. Some combo of voice/text input + using a cursor to click on geometry makes sense. For example, clicking on the surface in question and then asking for "m6 threaded holes at the corners". I think a drawing input also make sense as its quite quick to do.
- eMPee584 3 months ago
  
  Actually XR is great for this, with a good 3D interface two-handed manipulation of objects felt surprisingly useful when I last tried an app called GravitySketch on my pico4..
Legend2440 3 months ago

There are diffusion models for 3D generation. They make pretty good decorative or ornamental models, like figurines. They are less good for CAD.

dave1010uk 3 months ago

I 3D printed a replacement screw cap for something that GPT-4o designed for me with OpenSCAD a few months ago. It worked very well and the resulting code was easy to tweak.

Good to hear that newer models are getting better at this. With evals and RL feedback loops, I suspect it's the kind of thing that LLMs will get very good at.

Vision language models can also improve their 3D model generation if you give them renders of the output: "Generating CAD Code with Vision-Language Models for 3D Designs" https://arxiv.org/html/2410.05340v2

OpenSCAD is primitive. There are many libraries that may give LLMs a boost. https://openscad.org/libraries.html

geor9e 3 months ago

I get that CAD interfaces are terrible - but if I imagine the technological utopia of the future - using the english language as the interface sounds terrible no matter how well you do it. Unless you are paraplegic and speaking is your only means of manipulating the world.

I much prefer the direction of sculpting with my hands in VR, pulling the dimensions out with a pinch, snapping things parellel with my fine motor control. Or sketching on an iPad, just dragging a sketch to extrude is to it's normal, etc etc. These UIs could be vastly improved.

I get that LLMs are amazing lately, but perhaps keep them somewhere under the hood where I never need to speak to them. My hands are bored and capable of a very high bandwidth of precise communication.

klysm 3 months ago

I’m not sure that CAD interfaces are terrible, it’s just hard work
- geor9e 2 months ago
  
  That's what they would have said about drafting tools in the 1880s, command line AutoCAD in the 1980s. Try to imagine the 2080s, or the most visionary sci-fi movie about that future - one of those movies where people look back and says "wow they were close!". Suppose a character wanted to get an idea for a three-dimensional form from their brain onto some medium. They're not going to be clicking a plane, clicking sketch, clicking some lines, tapping the dimension hotkey, twisting their spacemouse a bit with their left hand, clicking extrude, dragging, tapping into their numpad the height, etc etc. We use the clunky interface of our age.
  
  klysm 2 months ago
  
  Perhaps, but the hard part about CAD to me isn’t the interference, but rather the specification of the geometries and the dependency structure. That’s the hard part that’s hiding in the background. It’s kind of like getting stuck on the syntax of a programming language instead of being worried about the semantics

emorning3 3 months ago

Wow! As someone that's written openscad scripts manually I can get real excited about this.

ein0p 3 months ago

I've done this, and printed actual models AIs generated. In my experience Grok does the best job with this - it one shots even the more elaborate designs (with thinking). Gemini often screws up, but it sometimes can (get this!) figure things out if you show it what the errors are, as a screenshot. This in particular gives me hope that some kind of RL loop can be built around this. OpenAI models screw up and can't fix the errors (common symptom: generate slightly different model with the same exact flaws). DeepSeek is about at the same level at OpenSCAD as OpenAI. I have not tried Claude.

derac 3 months ago

You've got to be a bit more specific, those words can all refer to many models.
- ein0p 3 months ago
  
  Typically only the most powerful models are worth a try and even then they feel like they aren't capable enough. This is not surprising: to the best of my knowledge none of the current SOTA models was trained to reason about 3D geometry. With Grok there's just one model: Grok3. With OpenAI I used o1 and o3 (after o3 was released). With Google, the visual feedback was with Gemini Pro 2.5. Deepseek also serves only one model. Where there is a toggle (Grok and Deepseek), "thinking" was enabled.

karolist 3 months ago

I wanted to use this process (LLM -> OpenSCAD) a few months ago to create custom server rack brackets (ears) for externally mounting water-cooling radiator of the server I am building. I ended up learning about 3D printing, using SolidWorks (it has great built-in tutorials) and did this the old fashioned way. This process may work for refining parts against very well known objects, i.e. iPhone, but the amount of refinement, back and forth and verbosity needed, the low acceptance rate - I do not believe we're close to using these tools for CAD.

jmcpheron 3 months ago

It's so cool to see this post, and so many other commenters with similar projects.

I had the same thought recently and designed a flexible bracelet for pi Day using openscad and a mix of some the major AI providers. I'm cool to see other people are doing similar projects. I'm surprised how well I can do basic shapes and open scad with these AI assistants.

https://github.com/jmcpheron/counted-out-pi

bdcravens 3 months ago

Most of the 3D printing model repositories offer financial incentives for model creators, as they are usually owned by manufacturers who want to own as much of the ecosystem as possible. (Makerworld, Printables, etc)

Widespread AI generation obviously enables abuse of those incentives, so it'll be interesting to see how they adjust to this. (It's already a small problem, with modelers using AI renderings that are deceptive in terms of quality)

rcarmo 3 months ago

As someone who enjoys doing CAD and spends a fair amount of time doing contortions to get OpenSCAD to do relatively simple things like bevels and chamfers, I’d say this is interesting because of the model ranking, but ultimately pointless because LLMs do not really have a mental model of things like CSG and actual relative positioning - they can do simple boxes and cylinders with mounting holes, but that’s about it.

isoprophlex 3 months ago

Makes you wonder if there is a place in the pipeline for generating G-code (motion commands that run CNC mills, 3d printers etc.)

Being just a domestic 3d printer enthousiast I have no idea what the real world issues are in manufacting with CNC mills; i'd personally enjoy an AI telling me which of the 1000 possible combinations of line width, infill %, temperatures, speeds, wall generation params etc. to use for a given print.

rowanG077 3 months ago

There is some industry usage of AI in G-code generation. But it often requires at least some post processing. In general if you just want a few parts without hard tolerances it can be pretty good. But when you need to churn out thousands it's worth it to go in an manually optimize to squeeze out those precious machine hours.

nohat 3 months ago

I tried this a few months back with claude 3.5 writing cadquery code in cline, with render photos for feedback. I got it to model a few simple things like terraforming mars city fairly nicely. However it still involved a fair bit of coaching. I wrote a simple script to automate the process more but it went off the rails too often.

I wonder if the models improved image understanding also lead to better spatial understanding.

cdchhs 3 months ago

how did you feedback the rendered photos or was it a manual copy-paste step?
- nohat 3 months ago
  
  Just a python script to render then api call with a prompt to check if the render looks right.

flimflamm 3 months ago

It would be interesting to see how far one could get with fine tuning and RL. One could for example take free scad models, give 2D pictures from different angles and ask LLM to recreate the 3D design in scad language. Then compare volumetrically how close they are and provide feedback.

Once that is done then ask LLM to create a prompt and compare outputs etc..

adius 3 months ago

I think it would be better to use a an existing programming language for CAD so that the LLM has more training data already.

Therefore im working on LuaCAD (https://github.com/ad-si/LuaCAD), which is a Lua frontend for OpenSCAD.

mertleee 3 months ago

Really curious how you got these tools to talk so elegantly to each-other? Is this an MCP implementation or?

itomato 3 months ago

I spent #Marchintosh trying to get a usable Apple EMate Stylus out of ChatGPT and OpenSCAD.

I took measurements.

I provided contours.

Still have a long way to go. https://github.com/itomato/EmateWand

howon92 3 months ago

> To my surprise, Zoo’s API didn’t perform particularly well in comparison to LLMs generating STLs by creating OpenSCAD

This is interesting. As foundational models get better and better, does having proprietary data lose its defensibility more?

jessfraz 3 months ago

Zoo co-founder here. Our product is still pre-v1. But getting to v1 very soon. We actually built a whole new CAD kernel from the ground up. I say this because we can't actually train on models the CAD engine does not yet support. Just 2 weeks ago we shipped csg boolean operations to the CAD engine. This unlocked new data to train our model on that use those operations. So its actually fair to say at the time he used our model we using about 2% of the data we actually have. Once we can use more and more the ability of the model will only get better.

GenshoTikamura 3 months ago

Oh, poor itty-bitty Skynet won't be able to create Terminators without mastering CAD modelling, so let us totally teach it

yapyap 3 months ago

ah yes, let’s interface all our programs with text

mattegan 3 months ago

While this all is super cool, and I don't want to downplay TFAs efforts - I'm kind of at my wit's end here. You've gotten me at a bad time.

I use a computer every day to do electrical (and sometimes, poorly) mechanical CAD. Getting frustrated with/at software is a daily occurrence, but CAD software is some of the worst.

Egg on my face, maybe, but for MCAD I use Fusion360. It's constantly bugging out - and I'm not even talking about the actual modeling workflow or tools! I'll get windows disappearing and floating above other windows. It won't work if you don't install updates within a few days of their release. If I go offline it pops banners in my face. Sometimes, duplicate copies of itself open up, presumably because the updater put a new binary somewhere on my machine that spotlight indexed while I had the previous version running... Sometimes you can't delete files in the cloud because they're referenced by... other deleted files?? A few weeks ago I installed an update like a good boy, and it literally broke the functionality of _being able to click on things_ in the model tree.

On the ECAD side, I use KiCAD for most personal/professional projects these days - very, very few complaints there, actually. However, a new client is using Altium, so here we go... My primary machine is a M3 Max MBP, and I know it's running through the ARM translation layer inside Parallels, but Altium was completely unusable! Opening the component library or moving the explorer window took multiple seconds.

I dusted off an X1 Carbon, which admittedly is 6-ish years old, but it was even worse there! You must understand, for schematic editing, this software's primary use is to drag rectangles around and connect them with lines. How difficult can this be? I had to get a new Windows machine just to be able to navigate around Altium without constant stuttering. Honestly, even on this new machine it's still slower than I'd like. This software is upwards of $5k a year for a single seat license! [1]

I grew up using Macromedia Studio 8, which I installed from a box of CDs, watching my father use his Pentium 4 machine to make complex block diagrams in Visio 2002. In the mid 2000s he was laying out PCBs in PADS without any issues on a laptop! Now a single tab of Lucidchart takes more memory than my old PC could even address, I can't resize the godforsaken library viewer window in Altium on a machine with 16 cores, and if I want to change the settings on my mouse I have to sit through Logitech asking me if I want to log in and join the mouse community to share usage tips? What the hell is going on!?

So, forgive me, and not to go full Casey Muratori, but when I see companies like AdamCAD trying to push this new paradigm, I just can't handle it. Can we please, please, please, just go back to making decent software that enables me to do my work without constant hassle? I don't give a single damn about the AI features if I can't count on the software opening and being usable day in and day out. I lose actual time each and every week to dealing with software issues and I'm so so over it.

[1] $5k for a single site license, of which to attain, you'll have to sit in a sales meeting for a half hour, during which the sales rep tells you that - technically the EULA establishes a 0.5 mile radius for your "single site" but - don't worry - using it at home 3.5 miles away is totally okay, he's not going to make you buy two $5k licenses - thank god!

monoid73 3 months ago

this is one of the more compelling "LLM meets real-world tool" use cases i've seen. openSCAD makes a great testbed since it's text-based and deterministic, but i wonder what the limits are once you get into more complex assemblies or freeform surfacing.

curious if the real unlock long-term will come from hybrid workflows, LLMs proposing parameterized primitives, humans refining them in UI, then LLMs iterating on feedback. kind of like pair programming, but for CAD.

wgpatrick 3 months ago

Complex assemblies completely fall on their face. It's pretty fun/hilarious to ask it to do something like: "Make a mid-century modern coffee table" -- the result will have floating components, etc.
Yes to your thought about the hybrid workflows. There's a lot of UI/UX to figure out about how to go back and forth with the LLM to make this useful.
- coderenegade 3 months ago
  
  This is kind of the physical equivalent of having the model spit out an entire app, though. When you dig into the code, a lot of it won't make sense, you'll have meat and gravy variables that aren't doing anything, and the app won't work without someone who knows what they're doing going in and fixing it. LLMs are actually surprisingly good at at codeCAD given that they're not trained on the task of producing 3d parts, so there's probably a lot of room for improvement.
  I think it's correct that new workflows will need to be developed, but I also think that codeCAD in general is probably the future. You get better scalability (share libraries for making parts, rather than the data), better version control, more explicit numerical optimization, and the tooling can be split up (i.e. when programming, you can use a full-blown IDE, or you can use a text editor and multiple individual tools to achieve the same effect). The workflow issue, at least to me, is common to all applications of LLMs, and something that will be solved out of necessity. In fact, I suspect that improving workflows by adding multiple input modes will improve model performance on all tasks.

mclau157 3 months ago

[flagged]