Skip to content

Conversation

@kjeremy
Copy link
Contributor

@kjeremy kjeremy commented Jan 13, 2020

No description provided.

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

bors r+

bors bot added a commit that referenced this pull request Jan 13, 2020
2825: Some clippy lints r=matklad a=kjeremy



Co-authored-by: kjeremy <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jan 13, 2020

Build failed

  • Rust (windows-latest)

@kjeremy
Copy link
Contributor Author

kjeremy commented Jan 13, 2020

bors retry

bors bot added a commit that referenced this pull request Jan 13, 2020
2825: Some clippy lints r=matklad a=kjeremy



Co-authored-by: kjeremy <[email protected]>
@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

@kjeremy I'll disable windows gating. Would you be interested in looking into why this test fails on windows? It should pass I think, so there might be a real issue hidden there!

@kjeremy
Copy link
Contributor Author

kjeremy commented Jan 13, 2020

@matklad annoyingly it passes on my machine.

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

@kjeremy just to sanity check, you run with RUN_SLOW_TESTS=1?

@bors
Copy link
Contributor

bors bot commented Jan 13, 2020

Build succeeded

  • Rust (macos-latest)
  • Rust (ubuntu-latest)
  • Rust (windows-latest)
  • TypeScript

@bors bors bot merged commit c5c5f42 into rust-lang:master Jan 13, 2020
@kiljacken
Copy link
Contributor

@matklad I had a hypothesis that the windows workers, probably being multi-tenant, are a bit overworked and thus not the best place for timing sensitive tests. Why the situtation on linux, or especially macOS, is better, is beyond me. Maybe the windows scheduler is significantly less fair?

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

The tests test that we do O(1) of work, so it is not time-senstive in the success path

@kiljacken
Copy link
Contributor

@matklad It's diagnostics_dont_block_typing that's failing right? That's timing sensitive in my mind 🤷‍♂. But I don't see why it should be failing unless the workers are incredibly congested.

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

Right. The server.request::<OnEnter> call should complete "immediately", I'd expect elapsed to be bellow 16ms. Ie, we should not wait at all, and should do a relatively small amount of CPU work. 2000ms is well over the budget we have here.

OTOH, the background work we need to do, without blocking OnEnter, is well over two seconds, so 2 seconds seems like a good arbitrary cutoff point which should not give any false positives or false negatives.

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

In other words, it is time-senstive in an absolute sense, but we have like 3 orders of magnitude error margin here in both directions

@kjeremy kjeremy deleted the clippy-lints branch January 13, 2020 17:57
@edwin0cheng
Copy link
Contributor

edwin0cheng commented Jan 13, 2020

Here is the output of in my Windows machine: (I dbg the elapsed time)

running 1 test
[crates\ra_lsp_server\tests\heavy_tests\main.rs:532] elapsed = 13.8147ms
test diagnostics_dont_block_typing ... test diagnostics_dont_block_typing has been running for over 60 seconds
test diagnostics_dont_block_typing ... ok
  1. It used 13.8147ms, so it is quite fast.
  2. However, it used over 60 seconds to shutdown the test.

@kjeremy
Copy link
Contributor Author

kjeremy commented Jan 13, 2020

@matklad it still passes locally with RUN_SLOW_TESTS=1

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020

@edwin0cheng yeah, this is exactly the expected behavior: we test that we quickly complete "foreground" job, while background tasks slowly completes in background.

@edwin0cheng
Copy link
Contributor

edwin0cheng commented Jan 13, 2020

Okay, I could almost reproduce this bug by setting the CPU Affinity to 1 in Task Manager:

[crates\ra_lsp_server\tests\heavy_tests\main.rs:535] elapsed = 929.6164ms
test diagnostics_dont_block_typing ... test diagnostics_dont_block_typing has been running for over 60 seconds

Maybe github windows VM is single core ?

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020 via email

@matklad
Copy link
Contributor

matklad commented Jan 13, 2020 via email

@edwin0cheng
Copy link
Contributor

Interesting! So that means we need to set thread priorities on windows
somehow?

I changed the code and did some tests on my Windows machine:

  1. Force it to run in cpu Affinity = 1
#[test]
fn diagnostics_dont_block_typing() {
    if skip_slow_tests() {
        return;
    }
    unsafe {
        let process = winapi::um::processthreadsapi::GetCurrentProcess();
        let mask = 1;  
        winapi::um::winbase::SetProcessAffinityMask(process, mask);
    }
  1. Print out the total elapsed time:
#[test]
fn diagnostics_dont_block_typing() {
    if skip_slow_tests() {
        return;
    }

   .....

    let elapsed = start.elapsed();
    assert!(elapsed.as_millis() < 2000, "typing enter took {:?}", elapsed);
    dbg!(elapsed);
}
  1. Print out the actual time used in that processing thread:
    pool_dispatcher
        .on_sync::<req::CollectGarbage>(|s, ()| Ok(s.collect_garbage()))?
        .on_sync::<req::JoinLines>(|s, p| handlers::handle_join_lines(s.snapshot(), p))?
        .on_sync::<req::OnEnter>(|s, p| {
            use std::io::Write;
            let _ = writeln!(&mut std::io::stderr(), "start!");
            let start = std::time::Instant::now();
            let r = handlers::handle_on_enter(s.snapshot(), p);
            dbg!(start.elapsed());
            r
        })?

The result is quite interesting :

start!
[crates\ra_lsp_server\src\main_loop.rs:485] start.elapsed() = 1.0432ms
[crates\ra_lsp_server\tests\heavy_tests\main.rs:537] elapsed = 1.1332555s

You can see the actual processing time is quite fast (1.0432ms), but the total elapsed time is around 1 second, which means we are blocking on the main loop !!

So I added a SetThreadPriority call in our main_loop :

pub fn main_loop(
    ws_roots: Vec<PathBuf>,
    client_caps: ClientCapabilities,
    config: ServerConfig,
    connection: Connection,
) -> Result<()> {
    ....
    unsafe {
        let tid = winapi::um::processthreadsapi::GetCurrentThread();
        winapi::um::processthreadsapi::SetThreadPriority(tid, 1);
    }        

    log::info!("server initialized, serving requests");
    {
        let task_sender = task_sender;
        let libdata_sender = libdata_sender;
        loop {
            log::trace!("selecting");
            let event = select! {
   ...

And it is the result :

start!
[crates\ra_lsp_server\src\main_loop.rs:485] start.elapsed() = 1.3183ms
[crates\ra_lsp_server\tests\heavy_tests\main.rs:537] elapsed = 98.3874ms

@matklad
Copy link
Contributor

matklad commented Jan 14, 2020

Fascinating! Opened #2835

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants