• 0 Posts
  • 85 Comments
Joined 2 years ago
cake
Cake day: June 21st, 2023

help-circle

  • According to cppreference:

    Unless otherwise specified, all standard library objects that have been moved from are placed in a “valid but unspecified state”, meaning the object’s class invariants hold (so functions without preconditions, such as the assignment operator, can be safely used on the object after it was moved from)

    I would expect this to be true of all types. An easy way to do this is to null an internal pointer, set an internal fd to a sentinel, etc and check for that when needed, but this could be an easy source of errors if someone’s not paying attention.

    Ideally it would be statically checked if a value is used after being moved, but that’s just my Rust brain speaking.



  • Where do you draw the line on “smart” features? Tab should not add indent spaces? Encoding or newline mechanisms? Determining EOF newline?

    For a very basic default editor, I would expect it to include only what I typed, no “smart” features, no IDE features, nothing else, and use CRLF (on Windows) for newlines with at most a setting to configure it in the editor for that session.

    Basically, I wouldn’t expect anything more than what nano does. If I want a fancy CLI editor, I’ll install one. At its core though, it should exist only to edit the text content of a text file and do nothing else. It should be as stable as possible, and have as little scope as possible, in my opinion.

    With that said, basic text editing features, like undo/redo and cut/copy/paste would be nice. Bonus points if it even works with the system clipboard.

    Edit: to add to the question of whether an automatic newline should be added, Windows has no requirement for terminating text documents with newlines, so I would not expect one. What happens in POSIX environments by tools written for those environments seems irrelevant here - if a valid text document in POSIX must be terminated by a newline, then a text editor there would naturally be expected to add one, or at least support adding one, but that has nothing to do with Windows.


  • The only part of this process I’d consider automating with a LLM is summarizing the changes, and even then I’d only be interested looking at a suggested changelog, not something fully automated.

    It’s amazing to me how far people will go to avoid writing a simple script. Thankfully determinism isn’t a requirement for a release pipeline. You wouldn’t want all of your releases to go smoothly. That would be no fun.




  • But how can we then ensure that I am not adding/processing products which are already in the “final” table, when I have no knowledge about ALL the products which are in this final table?

    Without knowledge about your schema, I don’t know enough to answer this. However, the database doesn’t need to scan all rows in a table to check if a value exists if you can build an index on the relevant columns. If your products have some unique ID (or tuple of columns), then you can usually build an index on those values, which means the DB builds what is basically a lookup table for those indexed columns.

    Without going into too much detail, you can think of an index as a way for a DB to make a “contains” (or “retrieve”) operation drop from O(n) (check all rows) to some much faster speed like O(log n) for example. The tradeoff is that you need more space for the index now.

    This comes with an added benefit that uniqueness constraints can be easily enforced on indexed columns if needed. And yes, your PK is indexed by default.

    Read more about index in Postgres’s docs. It actually has pretty readable documentation from my experience. Or read a book on indexes, or a video, etc. The concept is universal.

    May you elaborate what you mean with read replicas? Storage in memory?

    This highly depends on your needs. I’ll link PG’s docs on replication though.

    If you’re migrating right now, I wouldn’t think about this too much. Replicas basically are duplicates of your database hosted on different servers (ideally in different warehouses, or even different regions if possible). Replicas work together to stay in sync, but depending on the kind of replica and the kind of query, any replica may be able to handle an incoming query (rather than a single central database).

    If all you need are backups though, then replicas could be overkill. Either way, you definitely don’t want prod data all stored in a single machine, usually. I would talk to your management about backup requirements and potentially availability/uptime requirements.


  • Pronouns are pointers. “Let us (let’s) move it over there.” Both “us” and “it” indirectly refer to something else by a new name. Like pointers, the pointees are defined by some context external to that sentence/statement (usually earlier sentences/statements or some other actions). The meaning of “us” and “it” can change as well in different contexts, and as such, those words are not bound to one value (and “rebinding” those words by changing contexts does not change the values they were previously bound to).


  • This seems like the same problem that lifetimes solve in Rust - tracking when values are no longer used and thus fall “out of scope”. Automated tooling should really be doing lifetime analysis of these values, and that seems to me like it would fall well out of scope of what GenAI can be trusted to do.

    If this is such a huge problem, are you able to create finalizers that close the resources instead, or better abstractions for managing the LTs of these resources? I don’t write Java anymore, but this seems like a problem better solved by other tools.


  • If you are new to something and want to learn, seek resources and educate yourself with them. Learning takes time, and there are no shortcuts.

    A hot DB should not run on HDDs. Slap some nvme storage into that server if you can. If you can’t, consider getting a new server and migrating to it.

    SQL server can generate execution plans for you. For your queries, generate those, and see if you’re doing any operations that involve iterating the entire table. You should avoid scanning an entire table with a huge number of rows when possible, at least during requests.

    If you want to do some kind of dupe protection, let the DB do it for you. Create an index and a table constraint on the relevant columns. If the data is too complex for that, find a way to do it, like generating and storing hashes, sorting lists/dicts, etc just so that the DB can do the work for you. The DB is better at enforcing constraints than you are (when it can do so).

    For read-heavy workflows, consider whether caches or read replicas will benefit you.

    And finally back to my first point: read. Learn. There are no shortcuts. You cannot get better at something if you don’t take the time to educate yourself on it.



  • For your second part:

    A lot of open source projects exist to make people’s lives easier at work. The people developing these projects are often also people who have jobs as devs and have a use for the projects. It just so happens that it’s easier to use these libraries at work and share them with others when they’re more permissively licensed, and there are community benefits when people all contribute back to it.

    There’s nothing wrong with wanting to go the AGPL route and forcing everyone into open source, but that makes it much harder to use these tools at work, which often kills the motivation behind building them in the first place.

    I tend to be of the opinion that community tools should be GPL/AGPL, while libraries can be anything. It works as a compromise for both - so devs can have an easier time at work while also forcing contributions back to community-developed tools.

    Edit: I should also mention dual licensed AGPL/paid commercial. That model is probably my favorite, but unfortunately uncommon.


  • TehPers@beehaw.orgtoProgramming@programming.devThe Innocent Loop
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 months ago

    This seems to me more like a complaint about JS’s functional methods on arrays being eager rather than a complaint about loops. All of this is solved in languages with iterators (or async iterators for a potentially better solution).

    For example, in C#, .Where does nothing immediately on enumerable types, and on IAsyncEnumerable, you can go as far as streaming/chunking results from a DB into the enumerable and filtering off the DB asynchronously just by passing the enumerable into an await foreach. In Rust, you get the same options (though more verbose) using streams from futures.

    Edit: the rest of the article doesn’t really have much to do with loops, but these are all solved problems in C#, Rust, and I assume other languages too. Even Python has iterables, and you can configure your thread pools however you want to for concurrency.


  • For your goals, I would stick with Python unless you want to learn another language. There’s not much value to switch away when all the tools you need are primarily designed for Python.

    As far as functional programming goes, with AI stuff, my experience is that you generally are more interested in orchestrating services than FP. For example, run input through model #1, then based on the output, run one of these other 3 models (or multiple of them in parallel), then eventually pass it all back into another service/function to aggregate and format the outputs. You can think of each of these as being “functions”, but they’re much higher level than what you’d traditionally consider functions in FP and more along the lines of microservices.



  • Some of these key findings seem a bit overblown. The number of domains persistently connected to shouldn’t really matter - one is enough. Update checks are standard for software. Unique IDs/device fingerprinting are so common that browsers build in ways to try to prevent it at scale. JWTs are standard authentication tools - who’s the security concern for? ByteDance? Or are you saying the JWTs are from the local machine? And MessagePack isn’t exactly a secret format either.

    The TL;DR of this seems to be that ByteDance’s AI IDE collects a crazy amount of data and offers free AI services in exchange. I’m not really sure why you’d want those services, especially at the cost of all your code potentially being stolen or other data being collected, but it should be obvious that nothing in this world is truly free.