How I arrived from Lelux Mk.2 to Pytci
Many people found Pytci a great joke that uplifted their day. I'm glad that you did. I think, where C++ is the obviously good choice there python is the inobviously good choice for a compiler platform.
You may think it's the underdog, but it will lower your pants and fling you to the moon from your cock and you arrive while you're still laughing for how puny the contender is.
If you haven't noticed it yourself yet: In future the complexity of the system is going to hold you back. It's not going to be the performance or even the correctness of a program.
If you haven't noticed, computers have become more powerful faster than they've become more complex. Sure x86 instruction set is a mess compared to something from 1992. But the computers are as fast or faster than they predicted in science fiction movies in 1985. Computing performance is one of the few things that actually manages to outmatch the fiction they wrote.
Computing speed is getting abundant but computers aren't becoming simpler. Programmer's time isn't getting cheaper. Not everybody wants to be a programmer besides. Especially highly trained consultants aren't becoming cheaper.
You love to joke that "Fast, Good, Cheap", or "performance, correctness, development time. Pick one." In reality it's probably something more like a Great rhombihexahedron:
Somehow 99% of human population that know that programming languages exist, still think that purely the performance matters. In fact you can't obtain great performance without discarding large dose of your available platforms. Most of the time you trade memory for speed or otherwise. Almost always you trade simplicity for performance.
Sure, there are situations when you absolutely need the performance or correctness. Sure there do your dirtiest tricks to obtain it. But for the generic case it won't be long when you possibly can't even ensure correctness because you need to be able to understand what you do and do it in budget constraints.
A small sheet for why you shouldn't rely on C++ in your compiler project
I don't even have an exhaustive list.
- C was designed to allow C-written programs to perform fast, with cost of reliability, development speed, safety or correctness. It punishes you wonderfully if you're not paying attention by dropping your null checks.
- C++ attempts to make programs in this language easier to write and maintain by template metaprogramming. They have had some success which made them popular because great unix tools are interoperable, and C++ just extended C at beginning.
- Programs written in C++ take ages to compile because compilers do lot of redundant and pointless work in compiling them. The issues arise from templating combined with backwards compatible #include, lack of runtime type information, desire for getting both safety and performance in same language. Apparently it is hard to design a compiler that avoids the work, because clang isn't such a compiler.
- The presentations for compiling C++ programs leave out the nasty details and present something that is only true for C.
- The C++ is an obviously good choice for writing a new compiler. That's why LLVM has been written in C++. When something is too obvious, maybe you should keep your eyes open anyway.
The heroic story self-satisfied people write about themselves
I was improving Lelux Mk.2 before I created Pytci. My movement from a project to another tend to be erratic, but these two are connected.
I eventually managed to get Lelux where I wanted by using a very well working premade GCC+binutils+musl -based cross-compiling toolchain, before that I tried to compile two mainstream compilers and studied internals of Tiny C Compiler. Tcc was the only one I managed to cross-compile with little work, but contained a bug that let in relocation info into already relocated symbols. Even if the self-compiled tcc compiler worked just fine on glibc, musl-libc refused to run programs with symbol entries it cannot relocate.
Cyclic build-time dependencies
Setting up a compiler for your new Linux distribution provides some major headaches. The compilers themselves compile just fine, although you have to be careful about which files they work on, especially while cross-compiling. To compile anything with GCC or Clang, they require support libraries.
Glibc has a depedency on libgcc. The libgcc is part of the GCC. It is not obvious how to compile or install libgcc before everything else.
Clang didn't originally have their own support libraries. It's been only few years since they managed to compile clang with clang. They have actual libraries now. libunwind, libcxxabi, libcxx, but they've been written in C++. At build-time libcxx depends on presence of another libcxx. If you don't have a libcxx library in your system root you're compiling to, build of these libraries fails. On icing they had some cyclic build-time dependencies too.
It's not impossible to get yourself a bootstrapping compiler from either clang or gcc. It just takes disproportionally longer than anything else in your project, especially if you're trying to go small.
..And tcc was broken, and I checked into sources in order to fix it......
The problem is that my time allotted for this at this point has been already exhausted. I have to return to what I was doing prior upgrading Lelux. I dropped the ball by pointing out the issues in the tcc mailing list instead of attempting to fix them. But I were considering what I had found and learned:
- Unix systems are amazing. If something doesn't work well to you, you can replace it and leave in the parts you somewhat tolerate because they're familiar. Small steps by many people makes those systems greatest.
- Great unix tools have interoperability with what already exists.
- If the small pieces are dysfunctioning or annoying, it is not needed to tolerate the worst behaviors.
- If you want your tools to have maximum utility and can be replaced when they become useless, then you make them in unix-style.
In short, you shouldn't make a horsewhip that glues to users hand, unless you maintain a dungeon. And bestsellers aren't written just in Klingon, even if it weren't as striking.
The problems in tcc likely aren't due to undefined behavior of the C language. But C is pretty bare language: A correct C program cannot be simple. You have to take shortcuts with C to satisfy your needs, and those shortcuts can be hair raising. But in the other hand, doing otherwise would be insane in such a language.
At this point I thought it'd be still quite inobvious to write a compiler in Python. Did you know CPython cross-compiles like a snake? Well, you lose few legs (sockets, ctypes) but that isn't fatal for a compiler.
Aside the inobviousity, it actually feels like a good choice. The translation between C specifications and python weren't hard, until it comes to things that aren't too implementation specific. Studying around a bit let me put more pieces together.
It is easy to come up with something easy to read, although the preprocessor may still need some work to be more useful. And there should be plenty of projects that will benefit from some good, modular C-capable compiler being written in python.
From my part it was some impulsive behavior to send Pytci into reddit. I am taken by surprise that people found the project uplifting. Amazed of you people who started following Pytci, starred it or even forked!
I have no idea why they did so. But I can keep guessing. I guess it'd be too cocky to continue and put this into the reddit.