• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Can't understand why ASCII value "SOH" ('1') added to my array

 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I'm facing a problem which I do not understand at all. I'm doing a program which extracts digits from a file. I will cut to the chase:
Every line in the file where I'm extracting the numbers from is as follows: "00:00:01,194 --> 00:00:02,255" which means a dialogue (from a movie/doc) starts at 1 second and 194 thousands and ends at 2 seconds with 255 thousands. My code is the following:


Using GDB, I can see that the first iteration runs fine. Every index of array "left" are properly filled with the digits. The issue comes after the second iteration (when row is 1) because I can see that the array "left" and "right" are cleared out by memset( ) but when it adds the first digit to the "left" array at index 0, index 1 gets populated with hex value 1 which is ASCII for "Start of Header" and I don't understand why. As an example, let's say we are working with the data from arr[1] because we are done with arr[0]:
The "left" array right now looks as follows because memset( ) was just executed:


Now, line 14 gets executed and is the digit '0', the if-else statements are checked and line 21 is executed... the "left" array now looks as follows:


Next character is also a '0', if-else statements are checked and the digit is added. Now the array is as follows:


The file (an .srt file) that I read with a FILE pointer:
   

Hope this helps!
 
Marshal
Posts: 79471
379
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Have you verified where that \001 is in you file? Have you verified that all the characters in that file are ≤ 0xff? What is the encoding for your file? If you have any characters ≥ 0x100, or anything non‑ASCII, there is the possibility of an unexpected pairing.
 
Saloon Keeper
Posts: 27885
198
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is kind of gauche to me: "memset(left, '\0', 50)". If, as I presume, you mean for '\0' to stand for the octal byte value \000, I'd recommend that you follow common practice and write all 3 digits.

Although in actual fact, I'd simply code "memset(left, 0, 50)", as "0" is the universal nothing in C, signifying ASCII NUL, floating-point 0.0, and null pointer value (which, interestingly, on some systems is not actually address 0)..

Then again, I think that there may be a standard memset-equivalent that's explicitly intended to deal with character arrays. A character, after all, isn't always exactly one byte in the wider world.

As to getting a binary 1 for pass 1, my first instinct would be to see if the loop index was somehow getting pushed into a character.

But I think your code would probably be clearer (and easier to debug!) if you considered doing that central loop block using a switch statement.
 
Adrian Meneses
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Have you verified where that \001 is in you file? Have you verified that all the characters in that file are ≤ 0xff? What is the encoding for your file? If you have any characters ≥ 0x100, or anything non‑ASCII, there is the possibility of an unexpected pairing.


This is a bit out of my scope, not sure what are you asking.
 
Adrian Meneses
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:This is kind of gauche to me: "memset(left, '\0', 50)". If, as I presume, you mean for '\0' to stand for the octal byte value \000, I'd recommend that you follow common practice and write all 3 digits.

Although in actual fact, I'd simply code "memset(left, 0, 50)", as "0" is the universal nothing in C, signifying ASCII NUL, floating-point 0.0, and null pointer value (which, interestingly, on some systems is not actually address 0)..

Then again, I think that there may be a standard memset-equivalent that's explicitly intended to deal with character arrays. A character, after all, isn't always exactly one byte in the wider world.

As to getting a binary 1 for pass 1, my first instinct would be to see if the loop index was somehow getting pushed into a character.

But I think your code would probably be clearer (and easier to debug!) if you considered doing that central loop block using a switch statement.


My intention for setting '\0' with memset was because I was out of ideas after trying so many things before (one of those being '0'. I tried to implement the switch statement but at the time was working with arrays that had been malloc'ed. I will give it another shot with memset(left, 0, 50) as to ease your eyes haha!
 
Rancher
Posts: 508
15
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Adrian

Needed to fix a few things to get your code to compile. But the first real bug I spotted is:
strcat = string concatenation, and both arguments have to be strings. But c isn't a string - yes, it's a character, but to be a string length 1 it must be followed by the string terminator character '\0'. The way the code is at the moment it could be anything. This might result in a crash - either when the strcat() code goes off into invalid memory trying to find the '\0', or it finds one but the resulting string is too long to fit into left[] and it overwrites something it shouldn't.

Fix this, and it might work - I did, and I think it does.

A few points:
  • Why use (x - y == 0) instead of (x == y)?
  • To clear a string buffer you only need to set the first byte to '\0'; no need to memset() the whole buffer.
  • Even if this is only for your own use I would at least check the fopen() return value to verify that the file has been opened ok.
  • In your fgets() you use 100 to represent the size of buffer. But the size of buffer is sizeof(buffer) - much better to use that.
  • Not sure why you're using strtok() - you already have the string in buffer, what does strtok() with no delimiters give you?
  • You can use strchr() to check whether a character is one of a number of possibilities eg. strchr(":,->", c)

  • Cheers
    John
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:If, as I presume, you mean for '\0' to stand for the octal byte value \000, I'd recommend that you follow common practice and write all 3 digits.

    I think the intention is to set the whole buffer to the end of string character '\0', which is at least logically different to '\000' and correct in this context. But as I mentioned above unnecessary.
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Let me take a stab:

    It's probably not 100% correct, but here's a couple of things to note:
    1. Although the multi-clause case for the "magic" characters is  a bit tedious, it does generate efficient code and it's easier to read, I think, despite the verbosity.
    2. You can't do "strcat(dest, &charsomething). You allocated "c" on the stack and "c" is only one character long. The strcat method needs a terminating null and there's no telling what garbage might be following "c" in stack memory. Like maybe a \001. I used a couple of indexes for left and right concatenation because I'm an efficiency freak, but the net effect is supposed to be the same as left[strlen(left)] = c;. To make it work more like strcat, I added an extra instruction to terminate the expanded string, but in actual practice, I'd leave them open-ended and wait until the end of the loop, so I only had to write the terminating null once.
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    John Matthews wrote:

    Tim Holloway wrote:If, as I presume, you mean for '\0' to stand for the octal byte value \000, I'd recommend that you follow common practice and write all 3 digits.

    I think the intention is to set the whole buffer to the end of string character '\0', which is at least logically different to '\000' and correct in this context. But as I mentioned above unnecessary.



    I've never seen '\0' used as a NUL. And rarely seen \000 used, except in old Unix code, where octal was a way of life. Then again, I usually have a

    #define NUL 0

    in my code anyway.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:I've never seen '\0' used as a NUL.

    I find that very hard to believe - it's basic stuff, page 37 of The C Programming Language
    http://www2.cs.uregina.ca/~hilder/cs833/Other%20Reference%20Materials/The%20C%20Programming%20Language.pdf

    Or https://www.tutorialspoint.com/cprogramming/c_strings.htm
    Or https://en.wikipedia.org/wiki/Null-terminated_string

    Or anywhere
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    John Matthews wrote:page 37 of The C Programming Language
    http://www2.cs.uregina.ca/~hilder/cs833/Other%20Reference%20Materials/The%20C%20Programming%20Language.pdf

    Sorry - printed page 31.
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    You win . But as I said, in the real world, I've not seen it used. Note that at the top of the page, they recommend the 3-digit octal format. And of course this book was written by the original Unix-heads, so it does reflect a lot of practices that are no longer in fashion. I don't think that 0 was acknowledged as the "universal nothing" back then, either.

    Now if I wanted to get really pedantic, I'd say something about literal ASCII not being the common character code for a very long time. ASCII is a 7-bit character code. The IBM PC promoted an extended code (ASCIIZ) that used all 8 bits (in 8-bit ASCII, the 8th bit was for parity and that led to some fun code). More recently, we tend to use UTF-8 and Unicode, both of which are even less ASCII-like.

    Bu regardless, for clarity's sake, I do recommend doing a #define for NUL rather than using an actual literal, no matter what its form. I prefer the readability and portability that you get from a named manifest constant. Actually, in modern C, better even than an #define would be "const char NUL = '\0x00';" (Not being a DEC person, I prefer hex to octal). That way it's type-safe. The C++ standard iostreams do, in fact define a "nul", if memory serves.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:You win . But as I said, in the real world, I've not seen it used.

    I'm curious to know where that world is, because it's obviously very different to the un-real world I've been (professionally) writing C in, start-ups and multi-nationals, for the past 30-odd years where I don't remember anyone ever using anything other than '\0' as a string terminator. That's a genuine question. And again if you can show me anything online that uses/advises something different I'd be interested - C specifically as that's mostly what I do. You learn something new every day
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Well, I think I first started using C somewhere around 1980-81, when I paid $49 for Software Toolworks C for CP/M. One of the best products I've ever used. I did commercial team work on a major Macintosh product in the late '80's before transitioning to C++, having helped pioneer it on desktop computers in the form of the Lattice/SAS C++ product for Amiga (an attempt to port it to OS/2 was thwarted because the AT&T-licensed code didn't like working with segmented memory). I switched to Java in the mid-90's, but lately I've been doing a lot of Arduino code as well as occasional mods for open-source projects. But maybe I'm just living in an alternate universe.

    Still, I do recommend manifest constants rather than "magic numbers", no matter what syntax you use.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:Still, I do recommend manifest constants rather than "magic numbers", no matter what syntax you use.

    Definitely. But would you define constant for the tab character '\t'? Or newline '\n'? Perhaps you would. But anyway, '\0' isn't any different to those - the number 0 isn't really a number in this context, it's just part of the 'symbol' for the NUL character.

    And if you were going to use a constant for the end of string character, you wouldn't use NUL - that's not much better than 0. You would define something like EOS (which might be defined as NUL) to abstract the meaning from the value.
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yes, as a matter of fact, when working with text-heavy apps - and I used to be heavily involved in compilers - I have defined manifest comments for TAB, CR, and NL and sometimes others.

    I would say that \0 is different than \t, however, since in C, numbers beginning with 0 have their own specific magic.

    And, actually, I do use NUL instead of EOS as my constant name, since it's common in the literature to reference "null-terminated strings" or say that "strings are terminated by a null character". Force of habit, then. Although to me, EOS is more a position than a character. Java, for example, has an EOS position, but not (officially) an EOS character.

    In the end, however, it's all just people's preferences, and if the crowd you hang out with differs from mine, it matters little functionally. What's more important: did you resolve your problem?
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:What's more important: did you resolve your problem?

    Don't get me started on my problems...
    Oh, you meant the OP?
     
    Tim Holloway
    Saloon Keeper
    Posts: 27885
    198
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Oh, sorry. I forgot who was asking about what.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I appreciate in your position you are looking at a lot more threads than I am. One is about my limit.
     
    "Don't believe every tiny ad you see on the internet. But this one is rock solid." - George Washington
    a bit of art, as a gift, the permaculture playing cards
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic