Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to Python 3.8 #36

Open
dhowland opened this issue Jan 5, 2018 · 38 comments
Open

Move to Python 3.8 #36

dhowland opened this issue Jan 5, 2018 · 38 comments
Labels
Milestone

Comments

@dhowland
Copy link

dhowland commented Jan 5, 2018

Python 2.7 is about to become obsolete. Could the scripting host be upgraded?

@chcg chcg added the Feature label May 16, 2018
@chcg
Copy link
Collaborator

chcg commented May 27, 2018

See sergey-shandar/getboost#43, currently no nuget packet seems to be available for boost python which is correctly build against python 3.

Initial test branch created at https://github.com/chcg/PythonScript/tree/python36_test.

See also http://python3porting.com/cextensions.html

chcg referenced this issue in chcg/PythonScript May 27, 2018
…ckage, due to missing boost python3 nuget package
chcg referenced this issue in chcg/PythonScript May 27, 2018
…ckage, due to missing boost python3 nuget package
chcg referenced this issue in chcg/PythonScript May 27, 2018
…ckage, due to missing boost python3 nuget package
@jacktose
Copy link

jacktose commented Aug 9, 2018

boostorg/python#129 is closed and apparently fixed as of April. That should fix sergey-shandar/getboost#43.

Edit: sergey-shandar/getboost#43 is closed now.

chcg referenced this issue in chcg/PythonScript Sep 14, 2018
…ckage, due to missing boost python3 nuget package
@chcg
Copy link
Collaborator

chcg commented Oct 3, 2018

Ideas on this task from Claudia:

Hast du noch ein paar nette Ideen zum Umstieg auf Python 3? Mit Boost 1.68 sollte der möglich sein.
Ich habe scintilla 4.0.1 mit python3 getestet und bis dato ist mir nur aufgefallen das Python-Text nach oder von bytes konvertiert
werden muß.
Heisst, mit Python Version2 war ein String standardmäßig ein Byte-Object, mit Python3 ist dies nun ein Unicode-Object, wenn also ein
Script nun etwas wie editor.addText('some text') macht, wird dies nicht funktionieren da es eigentlich editor.addText(b'some text')heißen
müsste. Das ist aber unschön aus Sicht des Users so dass ein automatisches prüfen und konvertieren nötig wäre.
Gleiches gilt für die PS-Console.

Oder sonst noch Gedanken, was geändert/verbessert werden könnte?
Man könnte ein Feature eibauen, welches dynamische generierte dialog erlaubt.
Siehe hierzu https://blogs.msdn.microsoft.com/oldnewthing/20050429-00/?p=35743
Wenn man das auf static, radio/checkbox und normale button sowie ein text control beschränkt sollte es auch mit
überschaubaren Aufwand realisierbar sein.
Die Idee war in etwa sowas:

editor.customDialog('Custom Dialog Title',
(x,y,w,h),
[('static', 'Convert document', x,y,w,h),
('radio', 'ALL TO UPPERCASE', x,y, w,h),
('radio', 'all to lowercase', x,y, w,h),
('button', 'OK - let's do it', x,y, w,h),
('button', 'Ignore and close', x,y, w,h),])

Das skript übergibt also den Titel, x,y Koordinaten sowie die Breite und Höhe des Dialogssowie eine Liste
von dynamisch generierten Controls. Deren x,y sollten relative zum Dialog sein. Also ein x,y = 0 von static
wäre der erste linke obere Punkt innerhalb des Dialogs und nicht auf dem Monitor.

Bezüglich Beispielskripten, die werde ich Dir noch zukommen lassen werde sie nur ein bisschen überarbeiten,
heißt, vorallem, dokumentieren.

@chcg chcg changed the title Move to Python 3.6 Move to Python 3.7 Oct 4, 2018
chcg referenced this issue in chcg/PythonScript Nov 7, 2018
…ckage, due to missing boost python3 nuget package
chcg referenced this issue in chcg/PythonScript Oct 19, 2019
…ckage, due to missing boost python3 nuget package
chcg referenced this issue in chcg/PythonScript Apr 26, 2020
update to python 3.8.2

- update tp boost 1.71
- fixed some x64 compiler warnings
- fix msi build

- updated python 3.7.5 lib data
- moved extra libs into full

python 3.7.5
add unicode filename testfile

update to python 3.7.3

update boost to 1.70

update to python 3.7.1 and boost 1.68

initial try of build adaptations for python 3 support, see #36 on blockage, due to missing boost python3 nuget package
chcg referenced this issue in chcg/PythonScript Apr 26, 2020
update to python 3.8.2

- update tp boost 1.71
- fixed some x64 compiler warnings
- fix msi build

- updated python 3.7.5 lib data
- moved extra libs into full

python 3.7.5
add unicode filename testfile

update to python 3.7.3

update boost to 1.70

update to python 3.7.1 and boost 1.68

initial try of build adaptations for python 3 support, see #36 on blockage, due to missing boost python3 nuget package
@chcg chcg added this to the v3.0 milestone Apr 26, 2020
@chcg chcg changed the title Move to Python 3.7 Move to Python 3.8 Apr 26, 2020
chcg added a commit that referenced this issue Apr 26, 2020
update to python 3.8.2

- update tp boost 1.71
- fixed some x64 compiler warnings
- fix msi build

- updated python 3.7.5 lib data
- moved extra libs into full

python 3.7.5
add unicode filename testfile

update to python 3.7.3

update boost to 1.70

update to python 3.7.1 and boost 1.68

initial try of build adaptations for python 3 support, see #36 on blockage, due to missing boost python3 nuget package
@Ekopalypse
Copy link
Contributor

Great to see that moving to py3 is going to be done.
I know it is early alpha state but should I be able to open the console already?
Or running some very basic scripts like notepad.new()?
When it comes to testing or migrating the scripts to python3 I can offer my help.

@chcg
Copy link
Collaborator

chcg commented May 9, 2020

Most of the unit tests are already running. I just added the correction to use the python modules provided with the plugin installation. So I think you could already start migrating the scripts.

The major drawback I think is just that non UTF-8 files are not working correctly yet as could be seen with still failing https://github.com/bruderstein/PythonScript/blob/master/PythonScript/python_tests/tests/ReplaceAnsiTestCase.py .
Text is written as b'sample text'. This is happening due to the change that for python 3 the internal representation of a string is utf-8 now and not a bytearray. So it is not yet clear to me, if it is easier to convert unicode back to byte array and have the same handling within pythonscript like before with python2.7 or convert the bytearrays coming e.g. from scintilla to unicode which propably causes some problems how to know the encoding.

@Ekopalypse
Copy link
Contributor

Yes, this is a area which needs to be addressed. What I found is needed is something like this

    cdef inline _encode(self, unicode text):
        return text.encode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode(self, char* text):
        return text.decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode_const(self, const char* text):
        return text.decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode_with_length(self, const char* text, size_t length):
        return text[:length].decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')

the encode returns a bytestring whereas the decode, obviously, returns unicode.
That would mean, that each scintilla call which expects an char array would need such an conversion. As an example the usage from appendText from the point of a script

editor.appendText('Hello World') <-- I would suggest using this instead of forcing users to do
editor.appendText(b'Hello World')

but that would mean that the appendText call needs to do something like

cpdef void appendText(self, unicode text):
        ''' Append a string to the end of the document without changing the selection. '''
        encoded = self._encode(text)
        self.call(<sptr_t>self.sci_pointer, SCI_APPENDTEXT, <uptr_t>(len(encoded)), <sptr_t><char*>encoded)

Yes, this is a little overhead but using b'' strings doesn't feel natural for python3 developers

and of course getText is something like

    cpdef unicode getText(self):
        ''' Retrieve all the text in the document. Returns number of characters retrieved. Result is NUL-terminated. '''
        buffer_size = self.call(<sptr_t>self.sci_pointer, SCI_GETTEXT, 0, 0)
        if buffer_size:
            buffer = bytes(buffer_size+1)
            self.call(<sptr_t>self.sci_pointer, SCI_GETTEXT, <uptr_t>buffer_size, <sptr_t><char*>buffer)
            return self._decode(buffer[:-1])
        return ''

Unfortunately I don't know if boost::python provides a way to make this work easily.

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 9, 2020

I tried the latest build with a fresh npp 7.8.6 and it crashed.
Then I copied, from the old 2.7 installation, the PythonScript related files and directories
from .\plugins\Config to the new npp and it still crashed.
Finally I renamed lib from .\plugins\PythonScript to no_lib and copied an embeddable python382 into
.\plugins\PythonScript and was able to start npp.

image

The string(unicode/bytestring) conversions seems already to work.

Concerning the crash I assume something is missing in lib. I will check this out.
I'm going to work on the provided example scripts.

@chcg
Copy link
Collaborator

chcg commented May 10, 2020

@Ekopalypse Did you use 3.0.0 or already 3.0.1? 3.0.1 should fix the problem starting with the locally provided python modules from the pythonscript zip file.

@Ekopalypse
Copy link
Contributor

What I've described was done with 3.0.1, sorry for not being clear.
Btw. most of the samples are done. The remaining are depending on the text encoding conversion.
Do you want me to make PR for the fixed ones already or should we wait until the encoding stuff
got resolved?

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 12, 2020

@chcg - I don't know why but the first time I tested 3.0.1 it didn't work with the package I've downloaded but today with a clean npp and PS setup (the zipped x86 and x64) everything was ok, I mean from setup point of view.
I rechecked the lib directory, the one I renamed to no_lib and found that all of the python extension modules are missing, only the python files are there and most of the directories weren't created. Can't say why this happened but, as already said, today x86 and x64 zip worked.

By the way, with Python3 there is, although not officially supported by python, a way to use pip,
pythons package manager, together with an embedded distribution.
This could reduce the number of release packages to 2 (or 4 if the two source packages are counted as well), a x86 minimum and x64 minimum package.
Let me know if this is something which is worth investigating.

@chcg
Copy link
Collaborator

chcg commented May 12, 2020

@Ekopalypse Good news that 3.0.1 fixed what was expected to be fixed.

  • pip within the provided package would be of interest I think. The only issue I see is that an update of the PythonScript plugin by PluginAdmin probably removes also the modules installed via pip.

  • if you like to provide the corrected scripts already now that would be fine for me. Could you also add a list which scripts still need to be adapted and are now blocked due to the encoding problem.

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 14, 2020

@chcg
I will do a final test with all samples using utf8 and ansi encoded buffers, just to be sure I haven't overlooked something. Do you think it is needed to test utf-8 bom and ucs2 ... as well?
After that I will create the list of the failing sample scripts.

Concerning pip, do you see any chance that I can simulate the PluginAdmin workflow?

@cmeriaux
Copy link

cmeriaux commented May 14, 2020

@Ekopalypse
it's possible to simulate the pluginAdmin workflow. You have to build notepad++ in debug mode. Then you can customize the json file to point to a temporarily github release
json file:

<NPP_INST_DIR>\plugins\Config\nppPluginList.json

More info here: https://npp-user-manual.org/docs/plugins/#plugins-admin

@cmeriaux
Copy link

By the way, your work on this feature request is awesome and very much appreciated.
It's not possible for my limited software skill to help on the dev, but I'm volonture to be beta tester.

Good job guys

@chcg
Copy link
Collaborator

chcg commented May 14, 2020

See https://github.com/notepad-plus-plus/notepad-plus-plus/blob/ade01204c8800c6117bdc72eb476de4226adf4fa/PowerEditor/src/WinControls/PluginsAdmin/pluginsAdmin.cpp#L455

and

https://github.com/notepad-plus-plus/wingup/blob/master/src/winmain.cpp#L957

what happens on a update. So from my understanding the existing dir is moved to backup, replaced by the zip and on success the backup is deleted. This would make pip data disappear on each update, if it is installed in the pythonscript plugin dir.

@Ekopalypse
Copy link
Contributor

So the obvious solution would be to have pip and all installed modules in the plugins\config directory,
which means site.py module needs to be changed/adapted I guess. Let me think about it.

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 16, 2020

My current state is the following:

  • all scripts are python3 ready
  • most scripts will work with unicode and ansi files if those use the chars from the ascii table only.
    exception is HideLines which does have a problem with the markers

image

whereas it should look like this

image

That means, that all scripts, except of Disable Virtual Space, Enable Virtual Space, GotoLineCol_Barebones, InsertRuler and StartWithLocalPthon, do have the issues that

with utf8 enocded files they might run into an offset error

image

Note, the rectangle should include the ending bracket.

and with ansi files they throw an UnicodeDecodeError: 'utf-8' codec can't decode byte

beside this, the following has been observed.

CTags Based Autocompletion
utf8 encoded tags file:
utf8: completion list shows different encoding, offset issue
ansi: completion list shows different encoding, offset issue
ansi encoded tags file:
utf8: offset issue
ansi: completion list shows different encoding, offset issue

Formatter
utf8: last line wrong
ansi: decoding error

MultiEdit
utf8: ok
ansi: decoding error

Multiples_SR
utf8: doesn't work when using special chars
ansi: doesn't work when using special chars

RegexTester
utf8: ok
ansi: doesn't find special ansi characters

GotoLineCol
utf8: offset error
ansi: special char cannot be detected correctly

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 16, 2020

I'm trying to find out what the issue with the HideLines and the Formatter scripts, corrupts the last line to format, are because these errors doesn't seem to be encoding related.

@Ekopalypse
Copy link
Contributor

Issue with Formatter script has been identified. Here and here

it must be

endpos = editor.getLineEndPosition(...)

@Ekopalypse
Copy link
Contributor

I assume the problem with HideLines is not within the script but within the getStringFromObject function. From what I understand it returns the raw string, as it is typed, instead of the bytes.

@Ekopalypse
Copy link
Contributor

I guess a picture explains it better than what I wrote
image

@Ekopalypse
Copy link
Contributor

By changing the code for getStringFromObject to

std::string ScintillaWrapper::getStringFromObject(boost::python::object o)
{
	std::string raw;
	if (PyUnicode_Check(o.ptr()))
	{
		boost::python::object utf8Text = o.attr("__str__")();
		raw = std::string(boost::python::extract<const char *>(utf8Text));
	}
	else if(PyBytes_Check(o.ptr()))
	{
		raw.assign(PyBytes_AsString(o.ptr()), PyBytes_Size(o.ptr()));
	}
	else
	{
		boost::python::object rawString = o.attr("__str__")();
		raw = std::string(boost::python::extract<const char *>(rawString), _len(rawString));
	}

	return raw;
}

HideLines script works as well as adding byte strings

image

But I'm not sure if this is the way to go as this is working around the boost::python lib, isn't it?

@chcg
Copy link
Collaborator

chcg commented May 26, 2020

I did a similar change:

raw = std::string(PyBytes_AsString(o.ptr()));

for the byte case. The else case is probably not working any more. The issues which I faced with this are happening on reading back data from scintilla and setting the right ANSI encoding.
At least it would fix some of the remaining problems. I will add it.

@Ekopalypse
Copy link
Contributor

Ekopalypse commented May 26, 2020

If I understand the documentation correctly, using PyBytes_AsString without a length argument breaks HideLines, because the string would only be returned until the first \0 is received.

The problem with reading text as in getText, getTextRange ... could be solved by using PyUnicode_FromEncodedObject, I guess. The second argument is the encoding.

The question I ask myself is whether this is not already somehow covered in boost::python.
I mean, this is something everyone has to deal with. I can't find an example of that. I found some
complaints and a [PR] (boostorg/python#54), which is still open, but I don't understand enough to see if this would solve the problem.

By the way, are you trying to solve it for every possible ANSI encoding? I mean, isn't it enough to use ANSI aka what is the current system locale aka what Python calls mbcs?

Update: PyUnicode_FromEncodedObject might not be the right choice because if the encoding is set to, lets say, cp1252 then a script would expect a string like ö to be encoded as b'\xf6'
but I assume PyUnicode_FromEncodedObject would return b'\xc3\xb6', the utf8 representation.

@pete312
Copy link

pete312 commented Jul 7, 2020

How can we start using python3x ? These edge cases will not affect me.

@Ekopalypse
Copy link
Contributor

@pete312
download a zip from https://github.com/bruderstein/PythonScript/releases/tag/v3.0.1.
Depending on your notepad++ configuration you either have to unpack into %APPDATA%\notepad++ or
into the installation directory of notepad++.

@vsrawat
Copy link

vsrawat commented Jul 7, 2020 via email

@Ekopalypse
Copy link
Contributor

@vsrawat
I don't know as I don't use the installed version.

vinsworldcom pushed a commit to vinsworldcom/PythonScript that referenced this issue Nov 21, 2023
Fixed compilation and object file linkage of test project
vinsworldcom pushed a commit to vinsworldcom/PythonScript that referenced this issue Nov 21, 2023
@pryrt
Copy link
Contributor

pryrt commented Dec 4, 2024

@chcg,

I know I've generally been told that "ANSI/codepages" are the reason that PythonScript 3 has never moved out of Alpha. But there has been nothing on this issue in 4 years, other than a comment in 2023 that this issue was the tracker for PS3.

The original list of test failures in ANSI were fixed years ago. So what remains to be solved? What is still holding up the transition from PS3 being alpha to releasing it into Plugins Admin? Is there an actionable and testable list of things that still remain to be fixed/implemented, so that if someone wanted to contribute to that part of the effort, they would know what is needed?

(It can be frustrating in the Forum to have to support both PS2 and PS3 for scripts that I publish, because while I already made the transition to PS3 for my own use, the average Forum member will only install whatever's available in PluginsAdmin, so any public scripts have to work for PS2 as well, or average Forum members aren't likely to be able to use them. Every time I have to fix something, like here, because of differences in behavior or features between Python2.7+libraries and Python3.12+libraries, I am reminded yet again of how much I really wish PS3 were in Plugins Admin rather than just alpha.)

@Ekopalypse
Copy link
Contributor

Ekopalypse commented Dec 4, 2024

The issue still exists

{1D79F866-BD94-467B-B795-B3730AC37B1B}

and is not easy to fix, I think.
PS must treat the cases where a buffer is ANSI, DBCS or UNICODE differently.
In the old PS2 everything was a byte buffer and python2 created, or rather tried to create, a string from it.
Now Python3 assumes that it is a utf8 encoded byte stream, which means that another encoded buffer would have to be converted to utf8 or converted from utf8 to its original encoding.
This is feasible in itself, but what happens if you use actions based on different encodings?
For example, get a substring of X and mark it. Different length!
My attempt to act as a bytestream again has unfortunately not yet worked.

And I can understand your frustration.

@chcg
Copy link
Collaborator

chcg commented Dec 5, 2024

Failing tests are internal ones from
https://github.com/bruderstein/PythonScript/blob/master/PythonScript/python_tests/tests/

The main issue is within

//TODO how to get str.attr("encode")(codePageName) working again here

Most likely to overcome that problem the code needs a switch to internally handle the text as byte buffer again and not as python 3 does as utf8 stream.

@alankilborn
Copy link

Questions:

  • Could "PythonScript" and (a new) "PythonScript3" both appear in Plugins Admin? The answer is "they could" but the real question is, would that be reasonable?
  • Can't it just be made known to only use PS3 on UTF-8 files? Those publishing PS3 scripts could assert editor.getCodePage() == 65001 early on in the script...

@chcg
Copy link
Collaborator

chcg commented Dec 30, 2024

  • Using PythonScript in parallel to PythonScript3 requires renaming of the install folder and the plugin dll on master.
    I think it is not worth the effort to support both versions and would prefer to just switch to PythonScript3 and see how big is the user complain on it.

@Ekopalypse
Copy link
Contributor

As far as I understand, even with the latest Python version it is not possible to host multiple runtimes in one process

@alankilborn
Copy link

even with the latest Python version it is not possible to host multiple runtimes in one process

Was I or anyone else suggesting this?

@Ekopalypse
Copy link
Contributor

If there will be two versions in the PluginAdmin, there will be people who install both and then ...

@alankilborn
Copy link

Well, appropriate warnings should be added to description in PluginAdmin...

And really, maybe a deprecation warning for PS2 (I believe it is behind in new features and functions anyway).

@chcg
Copy link
Collaborator

chcg commented Dec 31, 2024

What I have in mind is replacing PythonScript(2) in PluginAdmin with PythonScript(3), so it appears as update and the old PS(2) installation is replaced and there should be no issue with a parallel installation of both variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants