Comment by ๐Ÿš€ stack

=> Re: "SIS Progress Update: One Year Later" | In: u/clseibold

@clseibold: I'll be the last person defending C -- having spent most of my life writing all kinds of languages and tools because it sucks. But it is pretty straightforward and ubiquitous, as well as time-proven, and the older I get, the more I find it more than adequate, holding my nose.

Build systems suck, but honestly, make with all its bullshit, has always worked for me once I figured out how to half-assedly use it. I never really understood the need for elaborate build systems, or want to be involved in projects so wide that they require ridiculous pre-build reconfiguration.

I cut my teeth on 6502 assembly and a hand-built 68K forth-like interpiler, so my worldview is more like Chuck Moore's -- you don't need most of the tooling of modern systems. Like librararies. Usually modern applications will link in 3 layers of bullshit to do every little trivial thing, resulting in 100MB applications we see today.

As for strings -- I haven't seen a good string library, or needed one. C string library will give you some basics, but as far as I am concerned, you should do your own string manipulations, and you will then avoid ridiculous slowdowns of your code and avoid terrible memory leaks. Only you know what kind of strings to expect and what you want to do with them.

If you don't rely on string libraries, you will perhaps notice that you can replace them with hashes, or count the size while comparing, whatever. You may know in advance the maximum size, or how they change, something a library cannot. You may align them in ways advantageous, split or pack them into cache-efficient structures, or use C's god-given casting abilities to work on them as 32 or 64-bit quantities, or even use SSE operations to massively accelerate processing.

A string library that is easy to use is not a good thing in my mind. Libraries are maybe for quick and dirty prototyping but frankly, for the weak-minded/corporate coders.

When I write code that will run at least thousands of times a day (and especially if it's millions) so if it, I feel it is my duty to make it as efficient in space and time as possible. It physically causes me discomfort to see Python being spun up to do some stupid thing because the coder just doesn't know any better.

And some of the most interesting and challenging problems come from thinking of what your data really is, and how to really deal with it. It is much more intellectually satisfying to figure out solutions that match the needs and the hardware, insead of just slapping on some database engine and a bunch of UI libraries, buying a bigger machine with more drives and calling it a day.

That is the 100X difference that Chuck Moore talked about. SpellBinding (written in C) is a 22KB executable, which could have easily been a 2MB executable with dependencies on 20 libraries alongside an Oracle database server...

So yes, C is dumb, but I appreciate its goal to not get in the way too much -- a blessing for me, and a curse for those who want 'a safe language'.

I suppose I am somewhat unusual because I code for joy, not for money. And generally, I prefer Lisp for prototyping and assembly for implementing.

=> ๐Ÿš€ stack

Jan 17 ยท 2 days ago

7 Later Comments โ†“

=> ๐Ÿš€ clseibold [OP] ยท Jan 17 at 22:51:

@stack I 100% agree about thinking about your data - data-oriented programming. I definitely do not agree about libraries, however. I think it depends on the library, lmao. Good luck writing your own TLS or cryptography code :P

About strings - all the modern programming languages have ways to intern strings. Interning is obviously important. They also all have hashtables, and some of them will even hash strings at compile-time.

These modern languages do not prevent you from storing strings in fixed arrays. I feel like you don't actually know how the languages I've mentioned above handle strings, so you're just assuming things based on how strings work in Java, Ruby, or some terrible string libraries in C/C++.

Zig, Odin, and Rust use very very different strings than Java, Ruby, D, and other languages. Java strings are absolutely terrible, for example, because they automatically do a bunch of memory allocations. Golang strings can do auto-memory allocations, but golang introduces this fancy new feature that C doesn't really have called slices - a pointer and a size. With a slice, you can parse a string by slicing what you need out of the string without any memory allocations.

Odin, Rust, and Zig take the slices idea even further by saying - hey, how about every single string is a slice of bytes!

This is what I mean when I say I feel like you have no experience in the new languages. Strings in these recent languages are very different and are actually very performant because they are slice-based. And yes, you can slice a fixed array of bytes, and so yes, you can have a string (a byte-slice) of a fixed string or even a C-string.

I find is funny, however, that the very ideas you are talking about: data-oriented programming, keeping in mind memory allocations and memory management, and performance and space optimization, are the very design considerations of Odin and Zig. Odin literally comes from the "Handmade Network" community that was associated with Casey Muratori, lmao.

Odin emphasizes memory allocation so much it makes memory allocation a very central focus of the language by offering a way to switch out the default allocator and temp allocator, and offering various types of allocators in its standard library.

=> ๐Ÿš€ clseibold [OP] ยท Jan 17 at 23:01:

@stack One final thing: the strings library that I keep mentioning in some of the languages don't necessarily do any memory allocations. Odin's strings library does almost no memory allocations, and strings, as I mentioned above, are just byte slices.

The point of a strings library is to make your job easier in that you don't have to rewrite a bunch of the same crap over and over again. Odin's strings library has functions for interpreting UTF-8 text from a byte-array, advancing to the next character in UTF-8 text, converting it to a unicode codepoint, checking what type of character that codepoint is, and the list goes on.

And if you want to build strings efficiently, then you would use a strings builder, which Rust, Golang, and Odin all have. So again, we have interning, slices, advancing characters in utf-8 texts, and string builders, and lots of other unicode-specific things as well.

The biggest assumption that I feel like I keep seeing is that we have not advanced past Object Oriented languages in how we deal with things in an efficient way. OOP languages handled strings very inefficiently, but that was the 2000s and 2010s. We're past the 2000s and modern languages are becoming less object-oriented, lol.

There are also lots of applications where you want strings with no maximum size limit, and some applications for strings where you don't know the size until you are given the size.

=> ๐Ÿš€ stack ยท Jan 18 at 00:07:

You are correct, I know very little about modern languages as I usually find something so offensive that the rest is not worth it.

Microsoft word used piece tables back in the 80s, a form of slicing, which was both wonderful and too complex, leading to unrecoverable crashed data

Slicing strings is hardly a new idea, and can be useful in certain situations -- along with immutable data structures that some new languages. However adopting yhat as the hammer to use on everything is also problematic and often leads to serious complexity, which of course requires interaction via more libraries...

Same with interning -- Lisp has been doing it for more than half a century, but it solves only some problems.

=> ๐Ÿš€ stack ยท Jan 18 at 01:41:

As far as crypto, 100โ„… with you. In fact, it should probably be a kernel service, not a library.

=> ๐Ÿš€ stack ยท Jan 18 at 03:09:

And apologies for hijacking your thread. I simply wished your code was accessible to me, and got triggered into a weirdly inappropriate C apologism!

=> ๐Ÿš€ clseibold [OP] ยท Jan 18 at 08:20:

@stack It's fine about the hijacking. I don't really mind.

Obviously modern languages aren't perfect, but I find their benefits to be significantly better than C's downsides. And yet Golang has some things I absolutely hate, and a GC, Rust is so complicated you have to spend a month relearning programming, and Zig's syntax is so awful I will never touch zig.

About the slices and interning, I'm well aware that they are not new, and this is precisely my point - C and C programmers got rid of old ideas that were useful.

Part of it was because C has to be so low level and compact, but over time as computers and programmers evolved, C didn't evolve with them. That's a very big problem the C standards committee is only now trying to solve, lol.

The fact that slices are integrated into the languages means they work exactly like arrays. There is almost no practical difference between a slice and an array. An array is just a pointer with a compile-time size. A slice is a pointer with a runtime size. Any array can become a slice, but a slice can't necessarily become an array, unless it's backing store is already an array. This concept is very important, because it means you can use slices like arrays. Everything you can do with byte arrays you can do with byte slices, and you can slice into arrays.

There is also no difference between using a slice vs just pointing into a string array and storing a size - that is in fact a slice!

Because strings are slices, you can use any backing memory for them. You can put a string in a contiguous fixed-size array, in a "vector" (stretchy buffer, etc.), etc. You can also reference static data in the executable.

With all of that in mind, I do not see how using slices is any worse than basing every string off an array, or a vector. In actually, the benefit of slices is that you can have a string that references into any memory allocation. In Java, your string has to be a "vector" (stretchy buffer), not so in Odin, Zig, or Rust!

But slices are not the only tool used to deal with strings. You can get bytes out of strings, codepoints, etc. All of this is very simple to do when you have slices - you can literally create a slice that references the few bytes that make up a character, or multiple codepoints (which is also sometimes necessary, as one codepoint doesn't equal one glyph), and you do this without any memory allocations or copying of data. And then if you want, you can convert that to a rune, a 32-bit integer that represents the codepoint(s) of the character(s).

So... slices, imo, are the proper way to handle strings, because they just point to any memory location, so they do not dictate allocation. That doesn't mean there aren't other tools.

Because Odin makes all strings slices, it actually forces you into dealing with the underlying data of strings, rather than just leaving it up to a library to do a bunch of allocations itself, lol. In Odin, string literals are all static data, but aside from that, you have to manually allocate some backing buffer to store any of your other strings. So the idea that you are talking about with having to think about memory is essentially required because strings are slices, and string literals are immutable.

In Golang, strings can be used and it will auto-allocate a backing buffer for you, but this is not so in Odin. This is why I think Odin is the best competitor to C in terms of lower-level programming.

Also, I believe piece tables are basically coming back because they has some nice characteristics that play well with live multi-user text editors. But anyways, slices don't dictate that you have to use piece tables, and piece tables are just one application of slicing.

=> ๐Ÿš€ stack ยท 14 hours ago:

I don't think saying that C failed to 'evolve' with modern programming practices is a fair way of looking at it.

In my mind, C is basically a hammer -- a way to amplify and focus your energy onto a particular point or plane, without restrictions. It takes a bit of effort to figure out how to hit, and even more effort to figure out what and where to hit, not to mention when not to hit something with a hammer.

Hammers have not changed much in centuries for a good reason -- for those who know how to use them, adding fancy features and safety devices only makes them worse.

Sure, you can damage your wrist, smash a finger, or kill someone with a hammer -- or break your face by hitting a rubber tire. The product may be extremely ugly if you blindly hammer at a chunk of metal. But when used properly by even mediocre users, the results are usable.

So while adding fancy features to add 'safety', implicit handling of complicated immutable datastructures, etc. may seem like a convenience, like all training wheels it's fun right now, but a year later you may find yourself wondering why because you still can't ride a bike.

So that is why I think C will not go away in spite of calls to replace it by a modern language (and the fact that Linus is bending to accomodate the mob is making me feel that Linux may fall apart). C just does not get in the way too much.

As for slices, they are a very useful abstraction. But as a devil's advocate, I have to say that SSE instructions handling 0-terminated strings are several times faster than those working on explicitly-sized strings-- because they do not need to simultaneously track the length.

Original Post

=> ๐Ÿš€ clseibold

SIS Progress Update: One Year Later โ€” SIS is my server "suite" software that allows you to set up different smallnet servers from within a gemini admin interface. It is called Smallnet Information Services, and it was inspired by IIS. After a year and a couple of months of work, I feel it is about a couple weeks out from a beta release, which I am excited about. I have finally mostly finished a major refactor update for SIS that will better manage certificates, hosts, and setting up servers,...

=> ๐Ÿ’ฌ 14 comments ยท 3 likes ยท Jan 15 ยท 4 days ago

Proxy Information
Original URL
gemini://bbs.geminispace.org/u/stack/23966
Status Code
Success (20)
Meta
text/gemini; charset=utf-8
Capsule Response Time
40.971009 milliseconds
Gemini-to-HTML Time
1.435949 milliseconds

This content has been proxied by September (ba2dc).