The fix wasn't easy, or C precedence bites

boston.conman.org

31 points by ingve 6 days ago

bsder 3 days ago

Even as someone who doesn't mind writing in C, I would absolutely flag that function as way, way, way, way too terse for no good reason.

The code gets ridiculously easier to read if you write src[0] and src[1] instead of (*src) and (*(src+1)). And, as a bonus, the precedence problem disappears.

I really don't understand people why write C code like the original code. It's just asking for a bug.

shash 3 days ago

In general avoid frivolous use of pointer arithmetic. foo[k] and *(foo+k) will usually generate identical asm, and the former is just easier to read…
- danhau 3 days ago
  
  Usually? I‘m willing to bet it will always. I wouldn‘t be surprised if the standard even specifies these two to be identical.
  
  1718627440 3 days ago
  
  a[b] is defined as syntactic sugar for *(a+b), so yes.
- cozzyd 3 days ago
  
  And if you want to go for eclectic, you can do [k]foo
  
  loeg 2 days ago
  
  You mean k[foo].
  
  cozzyd 2 days ago
  
  Yes indeed.
hyghjiyhu 3 days ago

For me it would be natural to use src[1] for an array but *(src+1) for an iterator pointer.
commandersaki 3 days ago

I know the precedence because I memorise the terse implementation of strcpy, but (also) write *src++ as *(src++) and even though it remains a bit of a mouthful the parenthesis ensures there is no doubt how to evaluate the expression.
stevage 3 days ago

To be fair, OP rewrote it as `src[1]`. It sounds like this was old code, maybe they weren't a good coder when they started.
spyrja 3 days ago

And for the love of God, please don't do 0[src].
- 1718627440 2 days ago
  
  Why?

foofoo12 3 days ago

It's really easy to add a remote code execution feature to your project, especially if you code like this. You might even add that feature subconsciously at 3am, while hacking on a different feature.

johnfn 3 days ago

Is this really a difficult precedence issue? It seems quite obvious to me that *foo + 1 parses as (*foo) + 1.

One that has gotten me more than once in Python (I really don’t code Python that much) is “1 + 2 if True else 3”. I keep thinking the parenthesis are “1 + (2 if True else 3)”, but it’s actually (1 + 2). Or am I lying to you and it’s the other way around?! I don’t know, why don’t you go check the Python interpreter :)

jolmg 3 days ago
Conditional expressions/operators, including e.g. the `?:` ternary operator in C-like languages, typically have about the lowest precedence, higher only than the assignment operators (and the comma operator in C-like languages). It's not just a Python thing; you'll find `+` having higher precedence basically everywhere.
Think of
```
  x = 1 + 2 if True else 3
```
like a shorthand for:
```
  if True:
    x = 1 + 2
  else:
    x = 3
```
which can be a common pattern in languages that don't really have conditional expressions, like bash.
II2II 3 days ago

> Is this really a difficult precedence issue? It seems quite obvious to me that foo + 1 parses as (foo) + 1.
Keep in mind that precedence rules are arbitrary constructs, typically based upon what the rule maker perceived as more convenient. Perceptions will vary from person to person, so there is no objective obvious about them. Heck, there isn't even anything obvious about infix notation (see Forth or Lisp). Or, in the case of unary operators, it isn't obvious that the operator should come before or after the object it is operating on (consider how we negate as a prefix, while factorial is a suffix).
- johnfn 3 days ago
  
  Do you think ++foo + 1 parses as ++(foo + 1)? Despite what you say, this seems obvious to me.
magicalhippo 3 days ago

I really dislike relying on precedence when it's more complicated than a few terms of basic arithmetic.
Parentheses are free and makes it absolutely clear what the intention is.
afiori 3 days ago

I have often felt this doubt but the only two cases where my intuition was actually wrong was with `new` operators and php's ternary operator
ErroneousBosh 3 days ago

Yes but as I said yesterday on another post, "Yngwie Malmsteen Code".
You could write it clearly by saying foo[1] instead of *(foo+1) which is what they ended up doing, but hey, pointer arithmetic looks complicated and clever, so let's show off with a WEEDLYWEEDLYWEEDLY guitar solo bit of code.
- quietbritishjim 2 days ago
  
  When you are manipulating (mostly) one pointed-to element at a time, and incrementing the pointer itself in between, then that's quite a different mindset compared to using an index into an array. I agree that the subscript operator is the cleanest solution here. My point is just that I think it was missed because it's easy to overlook, rather than because, as you say, someone is deliberately trying to be too clever.

loeg 3 days ago

There's another bug, where you don't update psrc in those error return cases. The parser will be stuck at the malformed % forever. Or maybe that is desired; it's hard to tell.

The precedence stuff here is pretty basic. When in doubt, using parentheses to make order explicit is never wrong. Or consult https://en.cppreference.com/w/c/language/operator_precedence... .

augustk 3 days ago

Better stick to a single point of exit.
https://news.ycombinator.com/item?id=20311080
- foofoo12 2 days ago
  
  There is nothing wrong with early exit, but you have to be sensible.
  Just like there's nothing wrong with the ternary operator if you are sensible. I've seen nested ?: abominations that would make Jesus give you a funny look.
- SAI_Peregrinus 2 days ago
  
  I disagree, I think returns should either be at the very start (checking inputs should be able to return before anything happens based on invalid inputs) or at the very end. Proceeding to do a bunch of stuff & then try to undo it because an input was invalid is error-prone.
  
  augustk a day ago
  
  If the input parameters are invalid there is an error in the program so it makes more sense to use the assert function. Why would you need to undo something due to the restriction of a single point of exit?
- loeg 2 days ago
  
  Well, maybe. Could also just use psrc directly instead of doing the manipulation in src.
- 1718627440 2 days ago
  
  The discussion you linked to convinces me of the opposite.

kazinator 2 days ago

isxdigit(*src) has another problem. The value *src is of type char, which can be negative.

The function has undefined behavior if given any value that is not in the range 0 to UCHAR_MAX, or the value EOF.

Some implementations are robust in the face of the values -128 to -1 but ISO C doesn't require that.

Surac 3 days ago

I have a little cheat sheet sticking on the side of my monitor that I update everything I have to look up some syntax. There I have a table on operator precedence and a reminder when to use & and &&. I never need the sheet but I always have ist in clear eyesight

abanana 3 days ago

> I never need the sheet
Often, just the act of making a cheat sheet somehow helps to fix the principles in the brain, so you rarely if ever need to refer to it afterwards. Something I've personally found multiple times anyway.

kazinator 2 days ago

I feel that *str + 1 is a strawman precedence issue in C compared to the real ones.

procaryote 3 days ago

The code style on this makes my eyes bleed. It was a long time since I saw anyone do "if () single statement; else { block of statements }"

Making the first thing a block doesn't add any lines and makes it less brittle, and makes future diffs better

And they do some weird alignment of assignments, and for some reason carry on adding extra spaces for some assignments even when they're alone in a block?

And they go out of their way to do pointer arithmetic rather than array operations that are more readable

And the code is essentially sscanf(str, "%%%2x", &value) plus some checks, so why not write that instead?

Also what kind of psycho uses CppStyleFunctionNames() in C?

zabzonk 3 days ago

> Also what kind of psycho uses CppStyleFunctionNames() in C?
People influenced by the Win32 C API. I prefer it that way myself.
- pjmlp 3 days ago
  
  That style predates Windows.
  
  GabrielTFS 3 days ago
  
  I would guess a significant portion of people using the style (if not most), did so inspired by Windows, though
  
  pjmlp 3 days ago
  
  It was common across all not UNIX operating systems.
  You will find it on MS-DOS, Amiga, OS/2, Mac OS.
  It was based on what was common in ALGOL derived languages.
  On UNIX you will find it on X Windows, and Motif.
  
  uxp100 3 days ago
  
  Macintosh toolbox was Pascal first right? Or at least there was an era where it was. And I think this naming convention is kinda a pascal thing.
  
  pjmlp 3 days ago
  
  Yes, Apple is the actual creator of Object Pascal variant, and there was a Pascal based OS using P-Code for Apple II GS.
stevage 3 days ago

Tbh it's been a while since I've seen anyone manually format code. Automatically formatted code definitely does help avoid certain kinds of bugs.
Does C not have the equivalent of Prettier?
I am curious what their editing process was that changed:
> assert(isxdigit((src+1)));
to
> if (!isxdigit(src+1)) return '\0';
- commandersaki 3 days ago
  
  Does C not have the equivalent of Prettier?
  There is clang-format, but it is not always easy to use on a code base because you don't want to have it run on imported/3rd party source which may mix in with your regular source tree. I've been meaning to use it on a project but only have run on an explicit list of files.
- 9029 3 days ago
  
  > I am curious what their editing process was
  They said:
  > I typed in the new code as that's faster than modifying the existing code
  
  stevage 3 days ago
  
  Ah. I bet there's a lot of vim users violently disagreeing...
- ciupicri 3 days ago
  
  There's the good old GNU indent https://www.gnu.org/software/indent/
  > But even if you fail in getting emacs to do sane formatting, not everything is lost: use indent.
  > Now, again, GNU indent has the same brain-dead settings that GNU emacs has, which is why you need to give it a few command line options. However, that’s not too bad, because even the makers of GNU indent recognize the authority of K&R (the GNU people aren’t evil, they are just severely misguided in this matter), so you just give indent the options -kr -i8 (stands for K&R, 8 character indents), or use scripts/Lindent, which indents in the latest style.
  > indent has a lot of options, and especially when it comes to comment re-formatting you may want to take a look at the man page. But remember: indent is not a fix for bad programming.
  (Linux kernel coding style, https://www.kernel.org/doc/html/latest/process/coding-style....)
lastdong 3 days ago

I was thinking the same thing; also being more explicit would have prevented the bug described in the first place