txt encoding

  • This topic has 24 replies, 7 voices, and was last updated Feb 25-5:43 pm by andfree.
Viewing 15 posts - 1 through 15 (of 25 total)
  • Author
    Posts
  • #99915
    Member
    andfree

      Hi. I open a txt file that I created in a laptop with old windows, but there is an encoding problem. I try “set encoding” with Geany, but I don’t see any difference.

      #99918
      Member
      sybok
        Helpful
        Up
        1
        ::

        Hi, I’ve used console-based tools ‘enca’ (detection, requires sufficient number of characters to perform a reliable estimate) and ‘iconv’ (conversion) in the past.

        #99922
        Member
        RJP
          Helpful
          Up
          0
          ::

          To check the filetype, run

          ls
          file filename-here.txt
          #99923
          Member
          andfree
            Helpful
            Up
            0
            ::

            Thank you. I installed enca, I read this & I tried the commands below:

            $ file mg.txt
            mg.txt: ISO-8859 text, with no line terminators
            
            $ enca -L none mg.txt
            Unrecognized encoding
            
            $ iconv --from-code=ISO-8859 --to-code=UTF-8 mg.txt > mg1.txt
            iconv: failed to start conversion processing
            
            $ iconv --from-code=ISO-8859-1 --to-code=UTF-8 mg.txt > mg1.txt

            The last command created a new “mg1.txt” file, but, although this file appears to be “UTF-8 Unicode text, with no line terminators”, the displayed text seems to be the same.

            #99935
            Member
            Robin
              Helpful
              Up
              1
              ::

              You can set in geany line-ends separately from encoding in the document menu. Available settings for a document are CR/LF (classical Windows style), LF (unix, default in antiX), CR (classical apple style). To get an idea about which line-endings are actually used in your original file you could set the checkbox „show lineends” from menu „view” active.

              Additionally check
              iconv -l
              to see a complete list of available input (and output) encodings. Try some of the WINDOWS-xxxx encodings instead of ISO-8859, or try some of the ISO-8859-x encodings.

              Windows is like a submarine. Open a window and serious problems will start.

              #100013
              Member
              andfree
                Helpful
                Up
                0
                ::

                Thank you.

                You can set in geany line-ends separately from encoding in the document menu.

                It doesn’t seem to make any difference.

                Try some of the WINDOWS-xxxx encodings instead of ISO-8859, or try some of the ISO-8859-x encodings.

                “iconv: illegal input sequence at position x” is the output for these encodings. Only for WINDOWS-1252 a new file has been created, but it doesn’t make any difference.

                #100016
                Member
                sybok
                  Helpful
                  Up
                  0
                  ::

                  What is the output of ‘enca’ on the newly created file(s)?

                  This situation reminds me of encoding troubles occasionally experienced with downloaded subtitles…

                  BTW, there is an utility ‘flip’ to switch between different types of line endings.

                  #100018
                  Member
                  andfree
                    Helpful
                    Up
                    0
                    ::

                    What is the output of ‘enca’ on the newly created file(s)?

                    $ enca -L none mg1.txt
                    Unrecognized encoding

                    there is an utility ‘flip’ to switch between different types of line endings.

                    $ flip -u mg.txt
                    mg.txt: binary file, not converted
                    
                    $ flip -m mg.txt
                    mg.txt: binary file, not converted
                    
                    $ flip -ub mg.txt

                    The last command seems to have converted the initial file, but this doesn’t seem to have helped.

                    #100021
                    Member
                    PPC
                      Helpful
                      Up
                      0
                      ::

                      Hi, I have a simple, low tech, suggestion: rename (example: new.txt), and then open a file that if encoded as you want your target file to be encoded. Open the target file. Copy it’s contents and paste them in the new.txt file. Save the new file. Close the files, try to open new.txt file.
                      This always works for me… Not ideal, but effective.

                      If all else fails, try opening the .txt file in LibreOffice Writer…

                      P.

                      #100111
                      Member
                      andfree
                        Helpful
                        Up
                        0
                        ::

                        Thank you. I’m not sure I understood. I created an empty “new.txt” file. I opened it with geany and noted that encoding was already UTF-8. I opened the mg.txt file, copied its content, pasted it in the new.txt file, saved the new.txt file, closed both files & opened again the new.txt file. Unfortunately, it doesn’t seem to have worked.
                        Encoding problem remains when I open the file with LibreOffice Writer & I can’t find how I set encoding there.

                        #100117
                        Member
                        sybok
                          Helpful
                          Up
                          0
                          ::

                          Could you please post a screenshot of what do you see in LibreOffice (LO) if the content is not too private?

                          Selecting encoding with LO: https://ask.libreoffice.org/t/how-do-you-change-the-encoding-for-certain-file-types/52833
                          It seems that you may also try to convert encoding of a file using LO: https://unix.stackexchange.com/questions/259361/specify-encoding-with-libreoffice-convert-to-csv

                          #100118
                          Member
                          zblsv
                            Helpful
                            Up
                            1
                            ::

                            andfree, can you show the contents of the file? At least part of it.
                            This converts binary data to Base64 and thus can be posted here as is:
                            dd bs=56 count=1 status=none if=mg.txt | base64
                            First 56 bytes (value of bs parameter).

                            Words are carried away by the wind...

                            #100289
                            Member
                            andfree
                              Helpful
                              Up
                              0
                              ::

                              Thanks for the replies. I attach a sample of how it seems in LO and how in leafpad (or geany). The language is greek.
                              I can’t see how “File -> Open and then select Text or Text – Choose Encoding as the file type” helps me to set encoding.

                              $ dd bs=56 count=1 status=none if=mg.txt | base64
                              4fDv9PHd8OXpCurh9OHq8d7s7enz5wrz+evn7dzx6eEK4fXu5+zd7efyCurh9OHt3Ov58+fyCuM=
                              • This reply was modified 2 months, 2 weeks ago by andfree.
                              Attachments:
                              #100293
                              Member
                              sybok
                                Helpful
                                Up
                                0
                                ::

                                The File -> Open … selection in the case of LO would mean that you do the job of selecting input file encoding. The expectation is that LO would display the text correctly then.

                                Perhaps, something is missing from your antiX system or it is the way of (old) Windows and encoding.

                                The first thing that comes to mind is fonts (reminds me of a poster informing Hitchcock’s “The Birds is coming” 🙂 ), https://packages.debian.org/stable/fonts/
                                But should not this be handled using unicode?
                                It turns out my understanding of these matters is … in need of expanding.

                                #100300
                                Member
                                andfree
                                  Helpful
                                  Up
                                  0
                                  ::

                                  The File -> Open … selection in the case of LO would mean that you do the job of selecting input file encoding. The expectation is that LO would display the text correctly then.

                                  Does this means that it would automatically select the appropriate encoding form or that there would be a list of encoding forms for me to select? Because, I can’t see such a list.

                                Viewing 15 posts - 1 through 15 (of 25 total)
                                • You must be logged in to reply to this topic.