• Very hard to debug issue

    From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 09:03:14 2026
    From Newsgroup: comp.lang.tcl

    I have a rather complicated process involving threads, channels and
    system processes. And it fails on one single computer until now.
    The environment is Windows 11.

    The errors I get are pretty random and I don't think they have much
    relevance. I have a hunch, that the errors are older catched errors,
    unrelated to the actual issue.

    So the process is:

    1. From the main thread, send a command on a parallel thread. The body
    of that command is seen below.
    2. In the the parallel thread, that command opens some communication
    pipes and starts a new system process (an Tcl executable) with some
    command line parameters (variable $cmdline below)
    3. I waits for the system process to end and returns the result.

    Like I said, the above steps are failing, but only on one single PC
    until now. Many other PC are in use with no issues.

    The user on that PC got full admin rights, and the problem is still
    there. Firewall and Anti Virus was checked and according to the IT, it
    blocks nothing.

    In order to provide an workarround, I changed the first step so that
    that command is executed directly in the main thread instead in a
    parallel thread. This works!

    But why?

    This is very hard to debug, because changing things means changing the
    Tcl excutable app. I might come with another solution for this though.

    But still, any ideas, why the command below fails only on one PC and
    only when executed in a parallel thread?

    Is there something wrong about the chan pipes?

    Could it still be a OS problem, something blocking the executation of an executable but only when the executable is started from the main app?

    Many thanks
    Alex


    # Open different channels for read and write
    lassign [chan pipe] processStdin writeChanId
    lassign [chan pipe] readChanId processStdoutErr
    # React when the read channel becomes readable (meaning, a new line was written to the channel)
    chan event $readChanId readable "::meshparts::GeneratePartBatchRead $readChanId $processStdoutErr"
    # Start the part generation process
    set res [::twapi::create_process "" -cmdline $cmdline -returnhandles 1 -inherithandles 1 -stdchannels [list $processStdin $processStdoutErr $processStdoutErr]]
    # Store the process handles
    lassign $res pid tid hproc hthread
    # Wait for the process to end
    set ::meshparts::twapi_pid($pid) ""
    ::twapi::wait_on_handle $hproc -executeonce 1 -async [list ::meshparts::GlobalVarSetFromTwapiCallback ::meshparts::twapi_pid($pid) ""] vwait ::meshparts::twapi_pid($pid)
    ::twapi::close_handle $hproc
    # Get the last line written which is normally the path of the generated
    model file
    if {[info exists lastline($readChanId)]} {
    set result $lastline($readChanId)
    } else {
    set result ""
    }
    catch {close $readChanId}
    catch {close $writeChanId}
    catch {close $processStdin}
    catch {close $processStdoutErr}
    array unset lastline $readChanId
    return $result
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From undroidwish@[email protected] to comp.lang.tcl on Wed Mar 4 10:12:14 2026
    From Newsgroup: comp.lang.tcl

    On 3/4/26 09:03, meshparts wrote:

    A shot in the dark, change

    ...
    # Wait for the process to end
    set ::meshparts::twapi_pid($pid) ""
    ::twapi::wait_on_handle $hproc -executeonce 1 -async [list ::meshparts::GlobalVarSetFromTwapiCallback ::meshparts::twapi_pid($pid) ""] vwait ::meshparts::twapi_pid($pid)
    ...

    to

    set ::meshparts::twapi_pid($pid) ""

    ::twapi::wait_on_handle $hproc -executeonce 1 -async \
    [list after idle \
    [list ::meshparts::GlobalVarSetFromTwapiCallback \
    ::meshparts::twapi_pid($pid) ""]]

    vwait ::meshparts::twapi_pid($pid)

    in order to catch a potential timing issue in that the
    wait_on_handle operation completes before the vwait.
    By using "after idle" you enforce an additional iteration
    of the event loop.

    HTH,
    Christian
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Ralf Fassel@[email protected] to comp.lang.tcl on Wed Mar 4 12:09:32 2026
    From Newsgroup: comp.lang.tcl

    * meshparts <[email protected]>
    | So the process is:

    | 1. From the main thread, send a command on a parallel thread. The body
    | of that command is seen below.
    | 2. In the the parallel thread, that command opens some communication
    | pipes and starts a new system process (an Tcl executable) with some
    | command line parameters (variable $cmdline below)
    | 3. I waits for the system process to end and returns the result.

    | Like I said, the above steps are failing, but only on one single PC
    | until now.

    Please always specify *how* it fails: are there error messages? Does it
    block somehow? Does it return wrong results?

    | In order to provide an workarround, I changed the first step so that
    | that command is executed directly in the main thread instead in a
    | parallel thread. This works!

    Dito: what does 'works' mean? No more error messages? No blocking?
    Which results?

    | # Open different channels for read and write
    | lassign [chan pipe] processStdin writeChanId
    | lassign [chan pipe] readChanId processStdoutErr

    I think you need to close some of these in the parent after spawning the
    child, see chan manpage, last paragraph cited:

    chan pipe
    Creates a standalone pipe whose read- and write-side channels are
    returned as a 2-element list, the first element being the read
    side and the second the write side. Can be useful e.g. to redi
    rect separately stderr and stdout from a subprocess. To do this,
    spawn with "2>@" or ">@" redirection operators onto the write
    side of a pipe, and then immediately close it in the parent. This
    is necessary to get an EOF on the read side once the child has
    exited or otherwise closed its output.

    | # React when the read channel becomes readable (meaning, a new line
    | # was written to the channel)
    | chan event $readChanId readable "::meshparts::GeneratePartBatchRead
    | $readChanId $processStdoutErr"

    NB. Always a good idea to build callbacks via [list] instead of "".
    Does not matter here, but if you get into the habit, one less possible
    error cause in other situations.

    | # Wait for the process to end
    | set ::meshparts::twapi_pid($pid) ""
    | ::twapi::wait_on_handle $hproc -executeonce 1 -async [list
    | ::meshparts::GlobalVarSetFromTwapiCallback
    | ::meshparts::twapi_pid($pid) ""]
    | vwait ::meshparts::twapi_pid($pid)

    I don't know how twapi handles processes, but if the
    twapi::wait_on_handle decided that it does not need to wait (eg because
    the process has already exited) and sets the
    ::meshparts::twapi_pid($pid) variable before returning, the vwait would
    block. You could check here that the variable it still set to what it
    was before the call to ::twapi::wait_on_handle and only vwait if this is
    the case.

    My €0.02

    R'
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 12:35:06 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 10:12 schrieb undroidwish:
    in order to catch a potential timing issue in that the
    wait_on_handle operation completes before the vwait.
    By using "after idle" you enforce an additional iteration
    of the event loop.
    Thanks! I'll try that.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 12:37:43 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 12:09 schrieb Ralf Fassel:
    Please always specify*how* it fails: are there error messages? Does it
    block somehow? Does it return wrong results?

    Sorry, normally I specify that but in this case I'm very unsure if those messages are relevant, as I wrote.
    The only thing that is common to the messages is this: "Das system kann
    die angegebene Datei nicht finden" which means "The system cannot find
    the specified file."



    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 12:40:18 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 12:09 schrieb Ralf Fassel:
    Dito: what does 'works' mean? No more error messages? No blocking?
    Which results?

    Not working means, that I very quickly get that error and some other
    that are unrelated.

    I think the actual Tcl executable app does not get executed at all. So
    the result is empty.

    When it works: The executable does lots of things, including calling
    some other third party executable, generating some files, read writes on
    the drive.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 12:41:09 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 12:09 schrieb Ralf Fassel:
    I think you need to close some of these in the parent after spawning the child, see chan manpage, last paragraph cited:

    I actually close those pipes at the end:

    catch {close $readChanId}
    catch {close $writeChanId}
    catch {close $processStdin}
    catch {close $processStdoutErr}

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Wed Mar 4 12:42:09 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 12:09 schrieb Ralf Fassel:
    Always a good idea to build callbacks via [list] instead of "".
    Yes I konw. It's just that I switch between both. I use "" only in the
    safe situations.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Rich@[email protected] to comp.lang.tcl on Wed Mar 4 13:05:20 2026
    From Newsgroup: comp.lang.tcl

    meshparts <[email protected]> wrote:
    # Start the part generation process
    set res [::twapi::create_process "" -cmdline $cmdline -returnhandles 1 -inherithandles 1 -stdchannels [list $processStdin $processStdoutErr $processStdoutErr]]

    One of the error messages, which **should** have been in this message,
    you finally included in a followup (always remember that none of us
    here can read your mind over USENET):

    The only thing that is common to the messages is this: "Das system kann
    die angegebene Datei nicht finden" which means "The system cannot find
    the specified file."

    Given this, what is the contents, and type, of "$cmdline".

    You will very likely get this exact error if $cmdline is a plain
    string, and if the first parameter therein is a path containing one or
    more space characters somewhere. Note the caution in the twapi docs
    over "create_process":

    Caller should ensure that CMDLINE is quoted appropriately.

    So I suspect your failure message on this one PC because this one user
    has a directory naming setup where $cmdline contains a path with spaces.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Christian Gollwitzer@[email protected] to comp.lang.tcl on Wed Mar 4 21:26:48 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.26 um 12:37 schrieb meshparts:
    Am 04.03.2026 um 12:09 schrieb Ralf Fassel:
    Please always specify*how* it fails: are there error messages?  Does it
    block somehow?  Does it return wrong results?

    Sorry, normally I specify that but in this case I'm very unsure if those messages are relevant, as I wrote.
    The only thing that is common to the messages is this: "Das system kann
    die angegebene Datei nicht finden" which means "The system cannot find
    the specified file."


    It can be something as stupid as a user profile directory containing
    spaces or non-latin1 characters (e.g. the user has the login André or a Chinese locale....)
    I'd try to print the parameters for exec instead of executing. Then
    compare between the main thread and the parallel thread. Also compare
    the contents of env() and the working directory

    Christian

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From et99@[email protected] to comp.lang.tcl on Wed Mar 4 17:09:39 2026
    From Newsgroup: comp.lang.tcl

    On 3/4/2026 5:05 AM, Rich wrote:
    meshparts <[email protected]> wrote:
    # Start the part generation process
    set res [::twapi::create_process "" -cmdline $cmdline -returnhandles 1
    -inherithandles 1 -stdchannels [list $processStdin $processStdoutErr
    $processStdoutErr]]

    One of the error messages, which **should** have been in this message,
    you finally included in a followup (always remember that none of us
    here can read your mind over USENET):

    The only thing that is common to the messages is this: "Das system kann
    die angegebene Datei nicht finden" which means "The system cannot find
    the specified file."

    Given this, what is the contents, and type, of "$cmdline".

    You will very likely get this exact error if $cmdline is a plain
    string, and if the first parameter therein is a path containing one or
    more space characters somewhere. Note the caution in the twapi docs
    over "create_process":

    Caller should ensure that CMDLINE is quoted appropriately.

    So I suspect your failure message on this one PC because this one user
    has a directory naming setup where $cmdline contains a path with spaces.


    On windows, there are several tools that can be used to determine the true command line that is executed. I've found that when using tcl to launch something, it's not always exactly what I thought it would be, especially when using exec. Quoting can be particularly confusing sometimes.

    I see however, that the OP is using twapi, which likely doesn't have this problem. However, here is how I would capture the command lines of all programs being launched - even if they exit immediately.


    First the tool is from Microsoft at:

    https://learn.microsoft.com/en-us/sysinternals/downloads/procmon

    Then on running the tool (it's in a zip, no install needed, just extract and run), one would go into the filters, filter menu, item filter...

    (if this is the first time using the tool, click on reset)

    Add one item and make sure it is checked,

    operation is process start then include

    click apply or ok

    All running programs from then on will log a single line including the true command line as run. There are many other filters that can be turned on such as setting the process name to the name of the program.exe that is being run. This program runs as some kind of driver, so it can capture events where a program runs, gets an error, and quickly exits.


    Alternatively, if the program doesn't exit immediately, there is process explorer, found here,

    https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer

    And this will by default show all the processes running. One can simply hover the mouse pointer over any process and it will show the command line used to launch that program.

    -e

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Thu Mar 5 07:19:26 2026
    From Newsgroup: comp.lang.tcl

    Am 05.03.2026 um 02:09 schrieb et99:
    I see however, that the OP is using twapi, which likely doesn't have
    this problem. However, here is how I would capture the command lines of
    all programs being launched - even if they exit immediately.
    I'm waiting for a result from the customer, then I'll have a log file
    that contains the executed command line.
    Actually, I think the command line is not executed at all at system level.
    But we will see...
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Thu Mar 5 07:20:59 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 14:05 schrieb Rich:
    So I suspect your failure message on this one PC because this one user
    has a directory naming setup where $cmdline contains a path with spaces.
    We tested already aganist that. It happens with any directory even
    without spaces.
    Besides: Why is it happening only in a subprocess, not in the main process?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Thu Mar 5 07:25:33 2026
    From Newsgroup: comp.lang.tcl

    Am 04.03.2026 um 21:26 schrieb Christian Gollwitzer:
    It can be something as stupid as a user profile directory containing
    spaces or non-latin1 characters (e.g. the user has the login André or a Chinese locale....)
    I'd try to print the parameters for exec instead of executing. Then
    compare between the main thread and the parallel thread. Also compare
    the contents of env() and the working directory

    Yes, I added commands that will log things including the command line.
    Waiting for the result from the customer...
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Rich@[email protected] to comp.lang.tcl on Thu Mar 5 13:29:43 2026
    From Newsgroup: comp.lang.tcl

    meshparts <[email protected]> wrote:
    Am 04.03.2026 um 14:05 schrieb Rich:
    So I suspect your failure message on this one PC because this one user
    has a directory naming setup where $cmdline contains a path with spaces.
    We tested already aganist that. It happens with any directory even
    without spaces.

    Besides: Why is it happening only in a subprocess, not in the main
    process?

    How should we know? This is the first time in this thread you've
    actually told us it does not happen in the main process. If you want
    help, you have to provide the relevant information for us to help.

    If the exact same fork of a process is failing only in a thread, that
    says that something environment wise may be different between the
    thread and the main process. So check that the windows environment
    variables are the same between main and thread (esp. the PATH
    environment variable).

    But I have great doubts that the call is **exactly the same** between
    both. It would be much more likely the main will be calling something different, and that difference is why it is successful while the thread
    is not.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Thu Mar 5 16:43:23 2026
    From Newsgroup: comp.lang.tcl

    Am 05.03.2026 um 14:29 schrieb Rich:
    How should we know? This is the first time in this thread you've
    actually told us it does not happen in the main process. If you want
    help, you have to provide the relevant information for us to help.
    Not true!
    See my original post:
    "In order to provide an workarround, I changed the first step so that
    that command is executed directly in the main thread instead in a
    parallel thread. This works! "
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Thu Mar 5 16:45:06 2026
    From Newsgroup: comp.lang.tcl

    Am 05.03.2026 um 14:29 schrieb Rich:
    If the exact same fork of a process is failing only in a thread, that
    says that something environment wise may be different between the
    thread and the main process. So check that the windows environment
    variables are the same between main and thread (esp. the PATH
    environment variable).
    I will know more when the user sends the the log file.
    He failed to do so until now...
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Rich@[email protected] to comp.lang.tcl on Thu Mar 5 17:02:01 2026
    From Newsgroup: comp.lang.tcl

    meshparts <[email protected]> wrote:
    Am 05.03.2026 um 14:29 schrieb Rich:
    How should we know? This is the first time in this thread you've
    actually told us it does not happen in the main process. If you want
    help, you have to provide the relevant information for us to help.
    Not true!
    See my original post:
    "In order to provide an workarround, I changed the first step so that
    that command is executed directly in the main thread instead in a
    parallel thread. This works! "

    Ah, fair enough, I did miss that.

    In any case, that still says that there is /something/ different, most
    likely some windows environment setting, between the two.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From et99@[email protected] to comp.lang.tcl on Thu Mar 5 11:49:34 2026
    From Newsgroup: comp.lang.tcl

    On 3/5/2026 7:45 AM, meshparts wrote:
    Am 05.03.2026 um 14:29 schrieb Rich:
    If the exact same fork of a process is failing only in a thread, that
    says that something environment wise may be different between the
    thread and the main process.  So check that the windows environment
    variables are the same between main and thread (esp. the PATH
    environment variable).
    I will know more when the user sends the the log file.
    He failed to do so until now...

    I didn't see if you said that this occurs all the time or intermittently. If the latter, it's unlikely that it's a configuration, and more likely a timing issue. Perhaps this one PC is slower or faster (or has fewer or more cores) and that the problem is lurking in the other PC's as well.

    As Christian noted, what if that vwait on ::meshparts::twapi_pid($pid) found that the variable was already set.

    My question is, IF it did hang forever on that vwait, would that cause the failure you are having. Usually a hung process has a different symptom - unless the hang is in a separate thread and something in the main thread (not hung) would then error out.

    The way I insure that a variable hasn't been already set before I vwait on it (when this is a possibility, and this is more likely with threading code) is to use an unset, like this:


    unset ::meshparts::twapi_pid($pid)
    ... code that triggers the setting async ...
    if ![info exist ::meshparts::twapi_pid($pid)] {
    vwait ::meshparts::twapi_pid($pid)
    }


    This way if it has already been set, you will not vwait for it. This testing is atomic in tcl, since the event loop can't be entered between the info exist and the vwait.

    Chrisitian said his code also does this, and he is very probably correct, so my technique may not be needed. But this is how I deal with this problem when using thread::send.

    Perhaps the reason you don't see the problem when single threaded is that the timing is different.

    -e

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Fri Mar 6 08:32:36 2026
    From Newsgroup: comp.lang.tcl

    Am 05.03.2026 um 20:49 schrieb et99:
    I didn't see if you said that this occurs all the time or intermittently.
    It's happening all the time, but only on that one machine.
    I just got the log from the user.
    I'll get back to you with a resolution hopefully.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Rich@[email protected] to comp.lang.tcl on Fri Mar 6 14:51:50 2026
    From Newsgroup: comp.lang.tcl

    meshparts <[email protected]> wrote:
    Am 05.03.2026 um 20:49 schrieb et99:
    I didn't see if you said that this occurs all the time or intermittently.

    It's happening all the time, but only on that one machine.
    ^^^^^^^^^^^^^^^^^^^^^^^^

    Which is the part that very much points to /something/ specific to that
    one machine. Either path names with spaces or utf-8 characters that
    are not handled correctly all the way through, or PATH= setting
    differences.

    I just got the log from the user.
    I'll get back to you with a resolution hopefully.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meshparts@[email protected] to comp.lang.tcl on Fri Mar 6 18:39:21 2026
    From Newsgroup: comp.lang.tcl

    Am 06.03.2026 um 15:51 schrieb Rich:
    Which is the part that very much points to/something/ specific to that
    one machine. Either path names with spaces or utf-8 characters that
    are not handled correctly all the way through, or PATH= setting
    differences.
    Yes, I know.
    This is why I initially suspected that it the Firewall or the Antivirus.

    I got some logs from the user and it looks like the issue is in this
    command below which is initializing the threads.

    thread::create -preserved "namespace eval ::meshparts {} {variable
    sourceexe $sourceexe; variable sourcedir \"$sourcedir\"; variable
    sourcefile \"$sourcefile\"}; thread::wait"

    Pretty dumb error: On one place the quotes arround $sourceexe are
    missing, since sourceexe is a path which my contain empty spaces.
    All other paths are enclosed between quotes.

    It's amaizing, that the code worked until now, since this variable is
    central for the working of the whole app.

    Could be that no other user stores the executable in a directory with
    empty spaces. Amazing!

    I know I should have used "list" to build up the string, but in this
    case there are those semicolons in the string ";" and I always have
    issues combining ";" in the "list" command.

    So I think this is problem solved, but I will know more on Monday.

    Many thanks to all!

    Cheers
    Alex
    --- Synchronet 3.21d-Linux NewsLink 1.2