• gedhrel@lemmy.world
    link
    fedilink
    arrow-up
    40
    ·
    6 months ago

    Casey’s video is interesting, but his example is framed as moving from 35 cycles/object to 24 cycles/object being a 1.5x speedup.

    Another way to look at this is, it’s a 12-cycle speedup per object.

    If you’re writing a shader or a physics sim this is a massive difference.

    If you’re building typical business software, it isn’t; that 10,000-line monster method does crop up, and it’s a maintenance disaster.

    I think extracting “clean code principles lead to a 50% cost increase” is a message that needs taking with a degree of context.

    • bonus_crab@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      6 months ago

      For what its worth , the cache locality of Vec<Box<Dyn trait>> is terrible in general, i feel like if youre iterating over a large array of things and applying a polymorphic function you’re making a mistake.

      Cache locality isnt a problem when youre only accessing something once though.

      So imo polymorphism has its place for non iterative-compute type work, ie web server handler functions and event driven systems.

  • zweieuro@lemmy.world
    link
    fedilink
    arrow-up
    13
    ·
    6 months ago

    Correct me if I am wrong but isn’t “loop unrolling/unwinding” something that the c++ and rust compilers do? Why does the loop here not get unwound?

    • Giooschi@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      6 months ago

      Loop unrolling is not really the speedup, autovectorization is. Loop unrolling does often help with autovectorization, but is not enough, especially with floating point numbers. In fact the accumulation operation you’re doing needs to be associative, and floating point numbers addition is not associative (i.e. (x + y) + z is not always equal to (x + (y + z)). Hence autovectorizing the code would change the semantics and the compiler is not allowed to do that.

  • TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    6 months ago

    I agree with the conclusion, and the exploration is interesting enough that I think it was worth sharing. Still, while the author seemingly knows this already based on their conclusion, it’s still worth stressing: these kinds of microbenchmarks rarely reflect real world performance.

    This toy case doesn’t have many (if any) real world performance-sensitive applications. At best, using shapes in games comes to mind, but shapes there are often represented as meshes, and if you really need the area that much, you might find that precalculating the area once is more impactful on the performance than optimizing how fast the area is calculated.

    Still, the author seems aware, and it seems to just be the author sharing their fun experiment.

  • Turun@feddit.de
    link
    fedilink
    arrow-up
    6
    ·
    6 months ago

    It would be interesting to see if an iterator instead of a manual for loop would increase the performance of the base case.

    My guess is not, because the compiler should know they are equivalent, but would be interesting to check anyway.

    • Deebster@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      6 months ago

      I wonder if the compiler checks to see if the calls are pure and are therefore safe to run in parallel. It seems like the kind of thing the Rust compiler should be able to do.

      • TehPers@beehaw.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        6 months ago

        If by parallel you mean across multiple threads in some map-reduce algorithm, the compiler will not do that automatically since that would be both extremely surprising behavior and in most cases, would make performance worse (it’d be interesting to see just how many shapes you’d need to iterate over before you start seeing performance benefits from map-reduce). If you’re referring to vectorization, then the Rust compiler does automatically do that in some cases, and I imagine it depends on how the area is calculated and whether the implementation can be inlined.

    • onlinepersona@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      6 months ago

      Do you mean this for loop?

      for shape in &shapes {
        accum += shape.area();
      }
      

      That does use an iterator

      for-in-loops, or to be more precise, iterator loops, are a simple syntactic sugar over a common practice within Rust, which is to loop over anything that implements IntoIterator until the iterator returned by .into_iter() returns None (or the loop body uses break).

      Anti Commercial AI thingy

      CC BY-NC-SA 4.0

  • BB_C@programming.dev
    link
    fedilink
    arrow-up
    2
    arrow-down
    3
    ·
    6 months ago

    No

    struct Shapes<const N: usize>([Shape; N])
    
    impl<const N: usize> Shapes<N> {
     const fn area(&self) -> f64 { /* ... */ }
    }
    

    Bad article 🤨