Ticket #2699 (new defect)

Opened 2 years ago

Last modified 5 months ago

build nevow without importing nevow

Reported by: zooko Assigned to: glyph
Priority: normal Milestone:
Component: Epsilon Severity: normal
Keywords: Cc: zooko, zooko@zooko.com, zookog@gmail.com
Estimated Completion (YYYY/MM/DD): Branch: branches/build-nevow-without-importing-2699
Author: exarkun

Description

Nevow's setup.py (through setupcommon.py) imports nevow in order to learn its version number. This can be problematic since nevow isn't actually installed yet at the time that its setup.py is executed, so in some use cases it can end up importing "the wrong nevow".

To test this we would have to put a known-bad Nevow, possibly just one that raises RuntimeError?("lose") when you import it, on the path, and then make sure that building Nevow doesn't import it.

I think that the way Nevow's setup.py could learn the version number without importing nevow would be by parsing the nevow/_version.py file -- that's what we do in Tahoe -- http://allmydata.org/trac/tahoe/browser/setup.py?rev=2756#L72 -- so I would be happy to do that again since I know how to do it and I know that it works. PJE has also suggested to use execfile -- http://bugs.python.org/setuptools/issue20

Change History

08/27/08 10:32:44 changed by zooko

Feel free to assign this ticket to me.

09/01/08 16:00:02 changed by zooko

Please assign this ticket to me. I'm now working on a patch.

10/22/08 18:13:07 changed by zooko

I submitted a patch which fixes #2527. If that works, then hopefully I will soon submit a patch that tests and fixes this ticket.

10/23/08 07:51:01 changed by zooko

Wow, this turns out to be harder than I thought.

The goal is to setup.py sdist nevow, which requires finding its version number to put into the distutils metadata, without importing nevow (which causes problems in some cases, including when packaging Nevow with py2exe and when installing Nevow and Twisted simultaneously with a single call to easy_install). Ideally we would also avoid importing twisted during the run of setup.py, so that people who have neither Twisted nor Nevow installed can install both at once with "easy_install Nevow". http://bugs.python.org/setuptools/issue20

The first thought I had was to have setup.py just read and parse nevow/_version.py. Unfortunately, it turns out that this won't quite work because the SVN revision number isn't in that file. I could, of course, write code into setup.py that parses nevow/_version.py and also code that parses .svn/entries (the way that twisted.python.versions does, but let's try simpler things first.

The next idea was suggested by PJE, the setuptools guy, use execfile on nevow/_versions.py. Unfortunately this doesn't work because twisted/_version.py uses __name__, which is equal to nevow._version when it is being imported, but not when it is being execfiled. I tried passing in a locals dict with its __name__ set to "nevow._version", but that causes an internal error in the Python interpreter. :-)

Okay, so now we're back to the question of how to get the SVN revision number into the setup.py. Here is a patch that makes setup.py parse the version numbers out of the _version.py file, but not the SVN revision count.

(Oh, this patch is on top of my patch that refactors setup.py for ticket #2527, so there is only setup.py, and not setup.py, setupcommon.py and setup_egg.py.)

HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ diff -u setup.py.orig setup.py
--- setup.py.orig       2008-10-23 06:47:48.000000000 -0600
+++ setup.py    2008-10-23 06:50:44.000000000 -0600
@@ -1,6 +1,13 @@
 #!/usr/bin/python
 
-from nevow import __version__ as version
+import os, re
+
+RE=re.compile("versions.Version\((.*?),(.*?),(.*?),(.*?)(,.*)?\)")
+for line in open(os.path.join("nevow", "_version.py")):
+    mo = RE.search(line)
+    if mo:
+        version = ".".join([x.strip() for x in mo.groups()[1:] if x and x.strip()])
+        break
 
 try:
     import setuptools

10/23/08 08:21:02 changed by zooko

So the way to test this would be to make a bogus "nevow" package that raises an exception and then run a python script which puts the bogus package earlier on the sys.path and then execfiles setup.py. This simulates pretty exactly what py2exe and easy_install do -- execfile on the package's setup.py as part of the process of installing it.

I can't submit a patch that runs this test because it would be a patch to the buildbot config rather than to the nevow source code, but here is a transcript:

HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ mkdir bogus
HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ cat > bogus/nevow.py
raise RuntimeError("bogus")
HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ python -i
Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, "bogus")
>>> execfile('setup.py')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "setup.py", line 3, in <module>
    from nevow import __version__ as version
  File "bogus/nevow.py", line 1, in <module>
    raise RuntimeError("bogus")
RuntimeError: bogus

And after applying the patch from this ticket to get the version number by sucking it out of nevow/_version.py instead of by importing nevow, we get:

HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ python -i
Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, "bogus")
>>> execfile('setup.py')
usage:  [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or:  --help [cmd1 cmd2 ...]
   or:  --help-commands
   or:  cmd --help

error: no commands supplied

Success! Except for the darn SVN revision numbers.

Now, setuptools happens to have a feature of figuring out the SVN revision number and appending it onto your sdist filename for you:

$ ./setup.py egg_info --tag-svn-revision 
$ grep ^Version Nevow.egg-info/PKG-INFO 
Version: 0.9.32-r16866

I notice also that setuptools's method of figuring out the SVN revision seems to be a little different from Twisted's. If you check out trunk, and run _getSVNVersion of twisted.python.versions.Version, and then someone checks in a patch to a branch, and then you check out trunk and run it again you'll get a new revision number. If you use the setuptools SVN-revision-sucker then you'll get the same revision number unless the checkin touched trunk (if I understand correctly) For example, right now I get 16896 on trunk using twisted's SVN-revision-sucker -- 16896 was a recent commit that touched a branch -- and I get 16866 using setuptools's -- 16866 was the most recent commit that touched trunk.

However, I doubt that the Nevow folks want to lose the ability to fetch SVN revision numbers without the use of setuptools, even if that would make Nevow more easy to package in py2exe and easy_install and cetera.

So, I think my next proposal will be that I copy some of the SVN-entry-parsing-code from twisted.python.versions into Nevow's setup.py.

10/23/08 08:22:25 changed by zooko

<Zooko pauses to see if the Nevow maintainers would rather start depending on setuptools's SVN-revision-sucker feature instead of, or in alternative to, Twisted's...>

10/23/08 08:48:07 changed by zooko

Okay, since the setuptools approach appeared to me to produce "better" svn revisions than the twisted one, I copied the setuptools one. Here is a patch which seems to make the nevow setup.py pass the "bogus nevow" test and produce nice version numbers including the most recent SVN revision that touched this branch.

HACK yukyuk:~/playground/nevow/trunk-plus-zookopatches$ diff -u setup.py.orig setup.py
--- setup.py.orig       2008-10-23 06:47:48.000000000 -0600
+++ setup.py    2008-10-23 07:43:35.000000000 -0600
@@ -1,6 +1,50 @@
 #!/usr/bin/python
 
-from nevow import __version__ as version
+import os, re
+
+def get_epsilon_version():
+    RE=re.compile("versions.Version\((.*?),(.*?),(.*?),(.*?)(,.*)?\)")
+    for line in open(os.path.join("nevow", "_version.py")):
+        mo = RE.search(line)
+        if mo:
+            return ".".join([x.strip() for x in mo.groups()[1:] if x and x.strip()])
+
+def get_svn_revision():
+    revision = 0
+    urlre = re.compile('url="([^"]+)"')
+    revre = re.compile('committed-rev="(\d+)"')
+
+    for base,dirs,files in os.walk(os.curdir):
+        if '.svn' not in dirs:
+            dirs[:] = []
+            continue    # no sense walking uncontrolled subdirs
+        dirs.remove('.svn')
+        f = open(os.path.join(base,'.svn','entries'))
+        data = f.read()
+        f.close()
+
+        if data.startswith('9') or data.startswith('8'):
+            data = map(str.splitlines,data.split('\n\x0c\n'))
+            del data[0][0]  # get rid of the '8' or '9'
+            dirurl = data[0][3]
+            localrev = max([int(d[9]) for d in data if len(d)>9 and d[9]]+[0])
+        elif data.startswith('<?xml'):
+            dirurl = urlre.search(data).group(1)    # get repository URL
+            localrev = max([int(m.group(1)) for m in revre.finditer(data)]+[0])
+        else:
+            print "unrecognized .svn/entries format; skipping %s" % (base,)
+            dirs[:] = []
+            continue
+        if base==os.curdir:
+            base_url = dirurl+'/'   # save the root url
+        elif not dirurl.startswith(base_url):
+            dirs[:] = []
+            continue    # not part of the same svn tree, skip it
+        revision = max(revision, localrev)
+
+    return str(revision)
+
+version = get_epsilon_version() + "-r" + get_svn_revision()
 
 try:
     import setuptools

The disadvantage, obviously, is having this big function in the setup.py. Alternatives include relying on the setuptools's --tag-svn-revision feature, losing the ability to tag nevow sdists with SVN revisions, and, oh here is a third alternative: you can import Twisted in the Nevow setup.py even though you can't import Nevow in the Nevow setup.py (without making the heads of some other tools like py2exe and easy_install explode). So, a third alternative is use the Twisted svn-sucker feature from the Nevow setup.py. I'm not sure precisely how to do that, and this patch satisfies all of my needs, so I hope you apply it.

10/25/08 08:10:39 changed by zooko

To summarize, the current nevow setup.py assumes that sys.modules and sys.path are in such a state that running import nevow will load the ./nevow from the filesystem that it is actually trying to build. These assumptions always hold when someone is invoking ./setup.py from the command-line, but they don't necessarily hold when a tool is building or packaging nevow, such as easy_install and py2exe do.

The easy_install guy, PJE, responded to my bug report about this by committing a patch that removes unneeded entries from sys.modules during the process of installing stuff. This makes the current nevow's assumption that import nevow will get what it wants true for easy_install. The fix has not been released in setuptools-0.6c9, but it is in setuptools SVN trunk. The py2exe issue can be worked-around by removing any installed versions of nevow from your operating system before trying to use py2exe to package nevow.

A better fix would be for the nevow setup.py to require less from its environment. The only thing that it needs as far as versioning is concerned is a string, like "0.9.32-r98765", which it passes to setup(). Requiring that the sys.modules and sys.path are set up so that a full working nevow and twisted can be imported is a pretty heavy way to get that string. The patch in this ticket ( http://divmod.org/trac/ticket/2699#comment:7 ) is one way to get that string, including to find the most current SVN revision at the time that setup.py is evaluated. Another way would be to make epsilon write out a flat string, such as nevow/_version_string.txt, and have setup.py simply do version = open(os.path.join("nevow", "_version_string.txt"), "rU")).read().strip().

This issue is no longer blocking me from deploying Nevow with allmydata-tahoe, because I can use an unreleased SVN snapshot of setuptools which removes "nevow" from the sys.modules at a key juncture. :-)

10/25/08 08:48:44 changed by zooko

Ah, alas, what I just said about being able to get past this issue using the SVN snapshot of setuptools turns out not to be true. This is because, although that patch does make it so that setup.py can import nevow and get the right one, setup.py also imports twisted. Hm...

A-ha! I can lie to setuptools and tell it that tahoe requires twisted for setup. That way, setuptools will install twisted and make it importable before proceeding, and that way it will be importable when setuptools gets around to setting up nevow.

It works! (At least on my Mac OS X laptop.) Hooray!

Of course, this means another project that tries to use setuptools/easy_install to automatically satisfy its requirement of Nevow will probably have this same problem unless they have setup_requires=['Twisted >= 2.4.0'] in their setup.py. Also they have no hope at all until the next stable release of setuptools.

11/18/08 06:51:37 changed by zooko

  • completion_estimate_date changed.

I'm waiting to see what happens with #2527 (easy_install compatibility) before doing more work on this ticket.

12/26/08 13:02:18 changed by zooko

I just encountered this issue. I used the first patch in the comments of this ticket (repeated below) to get around the problem. Here is the annotated transcript:

~/playground/nevow/Nevow$ # Get the current trunk of Nevow
~/playground/nevow/Nevow$ svn up
At revision 17110.
~/playground/nevow/Nevow$ # Make a source distribution.
~/playground/nevow/Nevow$ python ./setup.py sdist
Traceback (most recent call last):
  File "./setup.py", line 3, in <module>
    from nevow import __version__ as version
  File "/Users/wonwinmcbrootles/playground/nevow/Nevow/nevow/__init__.py", line 10, in <module>
    from twisted.python.components import registerAdapter
  File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/twisted/python/components.py", line 37, in <module>
    from zope.interface import directlyProvides, interface, declarations
ImportError: No module named zope.interface
~/playground/nevow/Nevow$ # Apply the first patch from ticket #2699
~/playground/nevow/Nevow$ patch -p0
--- setup.py.orig       2008-10-23 06:47:48.000000000 -0600
+++ setup.py    2008-10-23 06:50:44.000000000 -0600
@@ -1,6 +1,13 @@
 #!/usr/bin/python

-from nevow import __version__ as version
+import os, re
+
+RE=re.compile("versions.Version\((.*?),(.*?),(.*?),(.*?)(,.*)?\)")
+for line in open(os.path.join("nevow", "_version.py")):
+    mo = RE.search(line)
+    if mo:
+        version = ".".join([x.strip() for x in mo.groups()[1:] if x and x.strip()])
+        break

 try:
     import setuptools
patching file setup.py
~/playground/nevow/Nevow$ # Make a source distribution.
~/playground/nevow/Nevow$ python ./setup.py sdist > log.txt
~/playground/nevow/Nevow$ echo $?
0
~/playground/nevow/Nevow$ tail log.txt
hard linking nevow/test/test_testutil.py -> Nevow-0.9.33/nevow/test
hard linking nevow/test/test_url.py -> Nevow-0.9.33/nevow/test
hard linking nevow/test/test_useragent.py -> Nevow-0.9.33/nevow/test
hard linking nevow/test/test_utils.py -> Nevow-0.9.33/nevow/test
hard linking nevow/test/testsupport.js -> Nevow-0.9.33/nevow/test
hard linking twisted/plugins/nevow_widget.py -> Nevow-0.9.33/twisted/plugins
hard linking win32/nevow.nsi -> Nevow-0.9.33/win32
tar -cf dist/Nevow-0.9.33.tar Nevow-0.9.33
gzip -f9 dist/Nevow-0.9.33.tar
removing 'Nevow-0.9.33' (and everything under it)
~/playground/nevow/Nevow$ ls -ald dist/*
-rw-rw-r--    1 wonwinmc wonwinmc   513822 Dec 26 11:43 dist/Nevow-0.9.33.tar.gz

Now there is one drawback to this approach, which is that it is just using the version number stored in _version.py and it is not paying attention to the svn revision number. One solution to that is to use the setuptools feature which tracks svn revision numbers. Install setuptools and then:

~/playground/nevow/Nevow$ python ./setup.py egg_info --tag-svn-revision sdist > log.txt
~/playground/nevow/Nevow$ tail log.txt
hard linking nevow/test/acceptance/reconnect.py -> Nevow-0.9.33-r17105/nevow/test/acceptance
hard linking nevow/test/acceptance/tabbedpane.py -> Nevow-0.9.33-r17105/nevow/test/acceptance
hard linking twisted/plugins/nevow_widget.py -> Nevow-0.9.33-r17105/twisted/plugins
hard linking win32/nevow.nsi -> Nevow-0.9.33-r17105/win32
copying setup.cfg -> Nevow-0.9.33-r17105
Writing Nevow-0.9.33-r17105/setup.cfg
creating dist
tar -cf dist/Nevow-0.9.33-r17105.tar Nevow-0.9.33-r17105
gzip -f9 dist/Nevow-0.9.33-r17105.tar
removing 'Nevow-0.9.33-r17105' (and everything under it)
~/playground/nevow/Nevow$ ls -aldtr dist/*
-rw-rw-r--    1 wonwinmc wonwinmc   559899 Dec 26 11:58 dist/Nevow-0.9.33-r17105.tar.gz

05/19/09 14:11:53 changed by zooko

#2831 is a report that this issue also prevents Nevow from working with buildout.

05/21/09 12:52:40 changed by exarkun

  • branch set to branches/build-nevow-without-importing-2699.
  • branch_author set to exarkun.

(In [17303]) Branching to 'build-nevow-without-importing-2699'

05/21/09 12:53:23 changed by exarkun

(In [17304]) Here's an alternate approach to version definition; it avoids importing "nevow" sometimes

refs #2699

05/26/09 14:49:29 changed by benliles

If you are worried about tagging subversion builds, have you heard of the setup.cfg file?

[egg_info]
tag_build = dev
tag_svn_revision = true

http://peak.telecommunity.com/DevCenter/setuptools#managing-continuous-releases-using-subversion

06/08/09 14:50:10 changed by zooko

benliles: yes, that was my suggestion from e.g. #comment:5, #comment:6, #comment:7, #comment:11. I think it would be just dandy if the Twisted maintainers would make their lives easier by removing the feature of generating a version number based on the SVN revision number and if people want that feature they can use setuptools (or hopefully some future version of distutils, or some other tool).

exarkun: I looked briefly at the patch and I think "if setuptools is importable" isn't the right clause. The current trunk works fine when Nevow's setup.py is executing in a fresh Python interpreter with a virginal state, because then "import nevow" will get the right data out of that unbuilt Nevow source (since is usually at the front of sys.path).

The problem arises is setup.py is instead imported or read into a Python interpreter that already has state, which might include a "nevow" entry in sys.modules, or which has a funny sys.path that doesn't put the unbuilt nevow source at the front. Setuptools can cause this to happen, if setuptools is busy building something else and nevow happens to be a dependency of that something else. But other things besides setuptools can do the same thing, including py2exe, and I wouldn't be surprised if other tools like bbfreeze, pip, etc..

These tools think of the contract between them and the Nevow setup.py as requiring less of them -- they just execute setup.py and they don't think they need to clean their sys.modules of entries named "nevow" first. Also, note that even if setuptools is importable then maybe there isn't a sys.modulesnevow? and maybe there isn't any 'nevow' earlier on the sys.path, in which case even though setuptools is importable it wouldn't hurt to use "import nevow" as part of the process of building nevow to get the nice version numbers.

Also, I personally think it is ugly to have "two ways to do it" like that patch does. Maybe everyone in the world could just suffer the raw version instead of the nice version in order to make the setup.py simpler? I think that might be what I was suggesting at the beginning of this comment, if the difference between the nice version and the raw version is the svn revision numbers.

06/08/09 14:50:30 changed by zooko

P.S. Thank you for working on this issue.

06/15/09 10:30:15 changed by exarkun

What led me to the solution in the branch now was the realization that the only described use-case where there is a problem is a use-case where setuptools is importable. That naturally led me to the conclusion that the setuptools-available branch of the code was the place to fix the behavior.

As I understand it, pip depends on setuptools. However, now that you mention py2exe and such other tools, I suppose I can see how this fix isn't going to generalize.

One thing which is missing from this ticket is a description of how to reproduce the current misbehavior. I understand part of the scenario in which it can happen - that another version of the nevow package must be found by the import machinery in preference to the version being installed/built/whatever - but I don't understand how that precondition can be created.

I'm going to go look at py2exe and see if it is obvious how to reproduce the problem in that context. However, a description of how this was initially encountered would be useful (probably more so). A transcript would be particularly nice, but if you just want to clarify the middle paragraph of the ticket description, that's fine too.

06/15/09 10:55:31 changed by exarkun

Okay, my look at py2exe wasn't encouraging. Nevow has numerous significant issues in combination with that tool, so fixing the setup.py interaction with preinstalled versions of Nevow doesn't seem like it should be particularly high priority.

06/23/09 21:45:38 changed by glyph

I don't like this bug, because it runs counter to a long-term goal I have for every setup.py in Twisted and every Divmod project.

I want setup.py to be as short as possible. I want all the code that gets executed having to do with distribution to be present in modules, which can be imported and unit-tested.

I would also really like it if distribution infrastructure were handled by the appropriate project; for example, Twisted should have a function in it that allows the calling project to say "hey, I want to include some Twisted plugins" which modifies the distutils metadata appropriately, including adding post-installation hooks and other such things. I would also like these functions to just say "some Twisted plugins" and let the distribution code figure out which ones based on the filesystem structure. Convention over configuration and all that jazz. Happily Nevow doesn't really need anything special from distutils for JS modules, but if it did, it should provide a function to deal with it.

Is there any way we can be setuptools, pip, py2exe and easy_install friendly and still put distribution code into libraries along with all of our other code?

06/24/09 11:20:16 changed by zooko

exarkun: I believe the way this originally came up is described in this ticket: http://allmydata.org/trac/tahoe/ticket/440 . The history of that ticket is that building Nevow failed because Twisted was not installed at buildtime, and so we attempted to work-around that by making sure that Twisted was installed before Nevow was built, but that triggered a different and confusing bug in setuptools: http://bugs.python.org/setuptools/issue20 (package required at build time seems to be not fully present at install time?).

We were using setuptools so it was importable in this case.

(follow-up: ↓ 23 ) 06/24/09 11:51:28 changed by zooko

glyph: I'm not entirely sure I understand your concern here. I stayed up late last night playing "Tigris and Euphrates". It seems to me that the following patch reduces the amount of code and functionality that Nevow's setup.py uses at build-time:

--- setup.py    (revision 17537)
+++ setup.py    (working copy)
@@ -1,6 +1,11 @@
 #!/usr/bin/python
 
-from nevow import __version__ as version
+RE=re.compile("versions.Version\((.*?),(.*?),(.*?),(.*?)(,.*)?\)")
+for line in open(os.path.join("nevow", "_version.py")):
+    mo = RE.search(line)
+    if mo:
+        version = ".".join([x.strip() for x in mo.groups()[1:] if x and x.strip()])
+        break
 
 try:
     import setuptools

So, by reducing the amount of code used and the functionality (it jettisons the feature of counting patches to generate a -r$COUNT number, although it can use setuptools's feature to accomplish that), this makes it easier to have confidence that it is right, right?

Oh, I get it! Even though it reduces the amount of functionality, it moves it from a module inside nevow into the setup.py. You want code that is importable as a set of functions that you can invoke from within a trial test, rather than code that gets executed when you import a module such as the setup module. Because the latter is harder to test. (Although, note, not impossible to test.)

Yes, I want this too, but having your setup.py import other projects or even the current project causes problems for some build processes. It makes it easyer for those build processes if your setup.py imports other projects as little as possible.

So, since I have a similar desire to move code out of setup.py files and into separate, reusable pieces, that's why I moved the equivalent code out of Tahoe and some of my other projects into darcsver -- http://allmydata.org/trac/darcsver/browser/README.txt . darcsver is a tool to do what the problematic code in Nevow's setup.py does -- generate a version number and embed it into the project that it is building. (darcsver is for darcs instead of SVN.) It uses the setuptools entry points scheme so that it can be used by a setup.py with few or no changes to setup.py and without requiring the user who is going to build the project to be aware of darcsver's existence and to manually install it before building the project.

Unfortunately, the setuptools entry points scheme is a bit limited/buggy and isn't getting fixed, e.g. http://bugs.python.org/setuptools/msg235 , so darcsver is useful only for those cases where you don't run afoul of those bugs. In particular, setup.py's of my projects, such as zfec -- http://allmydata.org/trac/zfec/browser/zfec/setup.py?rev=324#L98 -- still have to have code to read a file and regex-match the version string even though darcsver could do that for them, because of the problem mentioned in http://bugs.python.org/setuptools/msg235 .

Hopefully the work to build a new distutils will give us a way to have modular, reusable build tools which do not impose manual labor on the user before building and which do not complicate the build process by importing large projects or importing the project which is itself being built. And which is supported and has tests.

I guess the point of this comment, then, is that while building your setup.py functionality out of modular, re-usable, testable code is certainly a good idea, having that code be imported by setup.py from the self-project or from other large projects raises other issues. Hopefully the distutils folks (led by Tarek Ziade) will come up with a better solution soon.

(in reply to: ↑ 22 ) 06/24/09 15:03:57 changed by glyph

Replying to zooko:

glyph: I'm not entirely sure I understand your concern here.

It seems like you figured it out later on.

I stayed up late last night playing "Tigris and Euphrates".

I ... really have no idea what this has to do with the rest of your comment :).

It seems to me that the following patch reduces the amount of code and functionality that Nevow's setup.py uses at build-time:

Diffstat has this to say about the patch in question:

 setup.py |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

So it certainly seems to increases the amount of code in setup.py. It also decreases setup.py's testability, and increases duplication, as we are extracting information that the object whose source is being parsed could have provided to us as the result of a method call.

As far as the metric you propose - the amount of code used by setup.py - I really couldn't care less. Isn't the whole point of having abstractions, not needing to care about that?

So, by reducing the amount of code used and the functionality (it jettisons the feature of counting patches to generate a -r$COUNT number, although it can use setuptools's feature to accomplish that), this makes it easier to have confidence that it is right, right?

Except it also uses a regular expression, which instantly takes the confidence that is correct to zero and leaves it there forever :-). Plus, those other things I just said.

Oh, I get it! Even though it reduces the amount of functionality, it moves it from a module inside nevow into the setup.py. You want code that is importable as a set of functions that you can invoke from within a trial test, rather than code that gets executed when you import a module such as the setup module. Because the latter is harder to test. (Although, note, not impossible to test.)

As I said above, it seems like you figured it out.

Except I'm not just talking about testing. Having your distribution code in modules (and, by extension, functions and classes) means it can be selectively imported, and you can do unusual distribution stuff without copying and pasting all of setup.py, or even necessarily execfile()ing it or trying to perform any tricks. You can write a second, different setup.py that invokes the same functions in a slightly different way. I mean, this is arguably the whole reason distutils exists anyway, right? You could inline all of distutils in the middle of your setup.py but we all presumably know why that isn't good.

Yes, I want this too, but having your setup.py import other projects or even the current project causes problems for some build processes. It makes it easyer for those build processes if your setup.py imports other projects as little as possible.

While I agree that this is a practical concern, that's another reason I specifically don't like this ticket. I don't want to build nevow without importing nevow. I would rather address specific problems, like setuptools, easy_install, or py2exe integration, in a way which still allows us to import code from nevow (and epsilon and twisted and so on).

So, since I have a similar desire to move code out of setup.py files and into separate, reusable pieces, that's why I moved the equivalent code out of Tahoe and some of my other projects into darcsver -- http://allmydata.org/trac/darcsver/browser/README.txt . darcsver is a tool to do what the problematic code in Nevow's setup.py does -- generate a version number and embed it into the project that it is building. (darcsver is for darcs instead of SVN.) It uses the setuptools entry points scheme so that it can be used by a setup.py with few or no changes to setup.py and without requiring the user who is going to build the project to be aware of darcsver's existence and to manually install it before building the project.

This sounds like a good idea, but, Nevow's users have balked in the past at adding even a small dependency. easy_install doesn't mitigate the difficulties with that for everyone. It also creates more work (and provides more opportunity for error) for OS packagers of Nevow, who are already having a hard time with it. Plus, we really don't want to require setuptools, due to its numerous difficulties...

Unfortunately, the setuptools entry points scheme is a bit limited/buggy

... such as this one ...

and isn't getting fixed, e.g. http://bugs.python.org/setuptools/msg235 , so darcsver is useful only for those cases where you don't run afoul of those bugs. In particular, setup.py's of my projects, such as zfec -- http://allmydata.org/trac/zfec/browser/zfec/setup.py?rev=324#L98 -- still have to have code to read a file and regex-match the version string even though darcsver could do that for them, because of the problem mentioned in http://bugs.python.org/setuptools/msg235 . Hopefully the work to build a new distutils will give us a way to have modular, reusable build tools which do not impose manual labor on the user before building and which do not complicate the build process by importing large projects or importing the project which is itself being built. And which is supported and has tests.

So, there is still a real problem even with this hope. While some things (the specific feature being discussed here, reading SVN version numbers) can be addressed by

Twisted has the Twisted plugin system, and code to support that system. It also has a huge amount of setup.py code already in modules. Much of that code is specific to the structure of the Twisted repository and would not be generally useful outside of that context. Now, we could split off that code into a separate "distribution" (in the distutils sense) and have it installed separately, but for many users that would just complicate the experience of getting Twisted started.

More generally: lots of projects, both libraries and applications, have custom distribution requirements, and they shouldn't be required to create a totally generic, separately distributed, separately managed project that just has a few little distutils hooks for their custom data files or post-installation setup.

I guess the point of this comment, then, is that while building your setup.py functionality out of modular, re-usable, testable code is certainly a good idea, having that code be imported by setup.py from the self-project or from other large projects raises other issues. Hopefully the distutils folks (led by Tarek Ziade) will come up with a better solution soon.

IMHO, these "other issues" which are raised by self-project importing setup.py are the responsibility of the distribution tools to fix. I appreciate that easy_install is providing a useful service to the community, but I'm really tired of wasting a huge amount of effort working around bugs in distutils and its extensions. This used to work, some tools (setuptools, py2exe) broke it, so (setuptools, py2exe) should fix it.

If the intention of distutils is really that setup.py should not work like a normal python script (i.e. put dirname(__main__.__file__) at the head of sys.path) then a simple, alternate option should be put forward for allowing code to be imported from the project being built.

All that said, zooko, I want to be clear that I'm not casting aspersions on you personally here. I do really appreciate all your efforts to make Twisted and Nevow "good citizens" of the distutils universe, while simultaneously following up on distutils, setuptools, py2exe (etc) bugs to fix the underlying problems which make that so difficult. So, thank you, keep doing what you're doing, and perhaps in the meanwhile we'll have to make Nevow build without importing itself, at least when setuptools is around.

I just want to be clear that in the long term, I want to put the version-detection code, as well as 90% of what's currently in setup.py, into a module in somewhere. If that module can conveniently be somewhere other than the 'nevow' package, okay, but I'm skeptical.

06/24/09 16:02:34 changed by zooko

I think this is a classic case of separately-developed codebases disagreeing about the scope of their interface. Glyph says distribution tools should allow importing from the self project in its setup.py. PJE says [http://bugs.python.org/setuptools/msg139 "Packages with dependencies should *never* import from those dependencies during setup.py execution, whether they're using setuptools or distutils."]. I don't think this is a matter of one side being more right than the other -- rather I think that it is a classic case of one component wanting the other component to provide more services or to support a wider scope of use cases, and the other component wanting the first component to ask less of it. Either way will work.

I actually agree with Glyph that I would like for distribution tools like distutils to offer a clean, modular, testable way to extend your setup.py with additional functionality. I'm not sure whether that way ought to be importing from the Python package is being built. But my current problem is not "the distutils is too limited", my problem is "I can't write programs that automatically install Tahoe-LAFS on various platforms and in various contexts such as zipped py2exe packages". Having Nevow ask less of the distribution tool will very likely help, not only with setuptools and py2exe, but potentially with other tools as well.

In other words, what I most want at this point is for both sides to agree on a contract so that I can use the two in combination, and what the exact contract says isn't that important to me.

Exarkun has written a patch in this ticket which detects whether the component on the other side of the interface is named "setuptools", and if so asks less of it (and therefore offers less to your own clients). I objected that there may be other components on the other side of that interface, not named "setuptools", but also wanting components on this side to ask less of them. I know from experience that py2exe shares this quality. I also objected that using the distribution tool in two different ways is uglier and more complicated than using it in just one way.

However, I suppose that's me wanting to improve Nevow itself -- as a mere user of Nevow who doesn't care how ugly it is as long as he can install it, I will happily accept exarkun's patch.

07/01/09 17:05:59 changed by benliles

By importing the version from Nevow which imports from Twisted, you create a pre-setup dependency on Twisted. You cannot run the setup.py unless you already have Twisted in your system path.

Setuptools does provide a method for requiring external packages at setup time, but it won't work in this case since the external package, Twisted, is required before the setup function executes.

From the code in subversion, it looks like there is a script that generates the Nevow/_version.py which is used to create the version. Why not simply modify that script so that the version number is written out to both the Nevow/_version.py and setup.py?

07/01/09 17:21:04 changed by exarkun

Why not simply modify that script so that the version number is written out to both the Nevow/_version.py and setup.py?

This form of solution might fix the immediate problem, but it doesn't address the larger issue which Glyph discussed in his most recent comment.

(follow-up: ↓ 28 ) 07/03/09 10:28:19 changed by zooko

As I understand it, Glyph's objection to is that it is hard to test things which happen when you import the module (the setup.py file) along with a bunch of other things that you can't prevent from happening at the same time (everything that is executed when the setup.py is imported).

This is a good objection to the very notion of how setup.py is designed. I hope Glyph posts about this on the distutils-sig mailing list and future versions of Python have a more modular and testable convention for build logic.

The Twisted/Divmod/Nevow way to make the build logic more testable is, as I understand it, to put it into modular pieces inside the project itself (which is to be built) and have the setup.py import those pieces.

The problem with this approach is that it requires more of the environment in which the setup.py is executed. In particular, py2exe might import an old version of Nevow from elsewhere in the system, and then try to build a new Nevow package. The import nevow in Nevow's setup.py would then yield the old one (which is already present in sys.modules) and not read the files of the new one from the source directory that is being used to build. There is a similar problem with building Nevow automatically using setuptools when it is a dependency of Tahoe-LAFS: the build system might not have Twisted already built and importable when it is time for it to build Nevow. Both of these issues would be eliminated if Nevow didn't need to import itself at build time, and in addition other undiscovered issues in these or other build/deployment environments could only be helped by Nevow requiring less of its build environment.

As a short-term way to make Nevow's version-number-detection be modular and testable, and also compatible with setuptools and such tools, how about this hack:

--- setup.py    (revision 17629)
+++ setup.py    (working copy)
@@ -1,7 +1,14 @@
 #!/usr/bin/python

-from nevow import __version__ as version
+def get_version(verfile):
+    RE=re.compile("versions.Version\((.*?),(.*?),(.*?),(.*?)(,.*)?\)")
+    for line in open(verfile):
+        mo = RE.search(line)
+        if mo:
+            return ".".join([x.strip() for x in mo.groups()[1:] if x and x.strip()])

+version = get_version(os.path.join("nevow", "_version.py"))
+
 try:
     import setuptools
 except ImportError:

This way a test can import setup.py and execute its get_version() function in order to test the code in that function.

Also, benliles's suggestion of having the generation script write into setup.py seems like it would work, as well.

I guess in general, exarkun wrote:

This form of solution might fix the immediate problem, but it doesn't address the larger issue which Glyph discussed in his most recent comment.

Tahoe-LAFS depends on about sixteen Python packages. I really want Nevow to stop being the one that fails to build or install on various platforms or requires special work-arounds in the Tahoe-LAFS build logic. I respect exarkun's and Glyph's practice of refusing to make their code less maintainable or less testable in order to add a feature, so I am carefully not asking for that. Instead, I'm trying to think of ways that Nevow can be kept at least as testable as it currently is while also fixing this issue. That doesn't mean that we have to solve the whole problem of how to have testable, modular build systems in Python before we can fix this issue. Either benliles's hack or my hack seem to fix this issue while allowing the code that was changed (the version snarfing code) to be automatically tested by the Nevow buildbot.

Thanks!

(in reply to: ↑ 27 ; follow-up: ↓ 29 ) 07/03/09 18:14:14 changed by glyph

Replying to zooko:

As I understand it, Glyph's objection to is that it is hard to test things which happen when you import the module (the setup.py file) along with a bunch of other things that you can't prevent from happening at the same time (everything that is executed when the setup.py is imported).

It's not just that it's hard to test. It's hard to change the parameters of when and where it's executed. Testing is just one way you might want to change those parameters. Of course, running under py2exe is another, but if we change it to be runnable under py2exe rather than testable, then I'm sure a bunch of other things we don't know about will break.

This is a good objection to the very notion of how setup.py is designed. I hope Glyph posts about this on the distutils-sig mailing list and future versions of Python have a more modular and testable convention for build logic.

I will do my best. Python development mailing lists consume a lot of energy and I'd rather direct that to other pursuits, but you set an example I will strive to follow :).

The Twisted/Divmod/Nevow way to make the build logic more testable is, as I understand it, to put it into modular pieces inside the project itself (which is to be built) and have the setup.py import those pieces.

I think it's important to break this strategy down into its constituent parts:

  1. put modular pieces into modules
  2. put those modules into the project in question
  3. import those pieces from the setup.py.

Part one is extremely important to me. Anything that involves putting code somewhere that isn't a module means it's going to execute in an even less controlled and reliable environment than we've already got. Python code should live in modules, because in the wild and unpredictable world of imperative execution and dynamic typing, you need some assumptions you can rely on, and "I can load this code by importing it" is a pretty basic one.

Part two isn't important per se; it just seems to be the location which makes the most sense to me. If the build support is for nevow, why wouldn't we put it in nevow? I realize that in the next paragraph you outline a reason for exactly that, but for me, that reason boils down to "because some tools which use distutils are buggy".

Part three isn't really important either. If the setup.py needs to do something other than the standard python 'import' dance to get the code loaded and behaving normally, fine.

The problem with this approach is that it requires more of the environment in which the setup.py is executed. In particular, py2exe might import an old version of Nevow from elsewhere in the system, and then try to build a new Nevow package. The import nevow in Nevow's setup.py would then yield the old one (which is already present in sys.modules) and not read the files of the new one from the source directory that is being used to build.

I think we need to talk about this in a lot more detail. These use-cases keep coming up, but they haven't really been explained. On my computer, py2exe doesn't ever try to import an installed version of nevow when building something which uses nevow; and I've used py2exe to do that a number of times in the past. I have difficulty imaginging a configuraiton where it would. Does py2exe itself do templating with nevow under some circumstances?

The particulars are important, because it sounds to me like a potential solution here would be to tell setup.py to do some of the work (for example, determining arguments to setup) by spawning off a subprocess, or temporarily cleaning sys.modules, or doing some other gross, but simple, hack to set up a deterministic import environment where we can load code from the system being built. However, I can't propose such a solution if the problem is just vaguely "well, there might be some junk that got loaded from somewhere".

There is a similar problem with building Nevow automatically using setuptools when it is a dependency of Tahoe-LAFS: the build system might not have Twisted already built and importable when it is time for it to build Nevow. Both of these issues would be eliminated if Nevow didn't need to import itself at build time, and in addition other undiscovered issues in these or other build/deployment environments could only be helped by Nevow requiring less of its build environment.

If your build system "might not" have already built Twisted, why don't you fix your build system so that it "will" have already built Twisted? I don't intend to be sarcastic, I honestly don't understand the circumstances where a system whose entire purpose is ostensibly to build and install things would mysteriously fail to build and install something, and yet it's not that system's fault. Nevow depends on Twisted and setup_requires Twisted, so shouldn't Twisted be built and deployed by the time Nevow gets started? Or is it Twisted's fault that it isn't built and installed? Is there a bug in the setup.py, either Nevow's or Twisted, that mis-instructs setuptools or distutils to not build and install it at the appropriate point in the dependency build process?

Also, I would understand more if this were a bug that just happened all the time, but why "might" not the build system have built Twisted? Is this like a dict-ordering-is-non-deterministic bug, or do certain systems cause the build to happen out of dependency order?

As a short-term way to make Nevow's version-number-detection be modular and testable, and also compatible with setuptools and such tools, how about this hack:

That's the third time you've quoted a very similar hunk inline. In the future, could you make the patch an attachment for easier download and application?

This way a test can import setup.py and execute its get_version() function in order to test the code in that function.

As you may imagine, import setup is a very, very sticky wicket indeed. Do you have any suggestions for the way this setup "module" might be disambiguated from the 900 other setup "modules" on my system, many of which are on my sys.path, probably before nevow's?

Also, benliles's suggestion of having the generation script write into setup.py seems like it would work, as well.

I thought I proposed this earlier in this ticket discussion, but it must have been somewhere else: in the long - or hopefully even medium term - I believe we could solve the problem by inverting it. Nevow (and Twisted) could generate its setup.py by importing its own code, just as I like. It's easy enough to get an environment with Combinator where I can import Nevow without thinking about distutils at all to bootstrap that process. Even better, however, would be to have setup.py be something extremely short that loads a pile of static metadata, and have the static metadata be generated ahead of time.

This does mean that installing directly from version control would require an additional step, as setup.py would effectively become a generated file (even if we didn't generate the setup.py itself). However, release tarballs could all include it.

I guess in general, exarkun wrote:

This form of solution might fix the immediate problem, but it doesn't address the larger issue which Glyph discussed in his most recent comment.

Tahoe-LAFS depends on about sixteen Python packages. I really want Nevow to stop being the one that fails to build or install on various platforms or requires special work-arounds in the Tahoe-LAFS build logic.

I vehemently agree. More importantly to me, the ability to simply move related code into an additional dependency without worrying over the excruciating agony this will cause our users would be fantastic. For example, there are a bunch of things in Nevow which are duplicated by Epsilon, and it would be really nice to just have that be a dependency and forget about it, but Nevow users would be inconvenienced even further by an array of setuptools bugs we're not even talking about yet.

Again, thanks for toughing out this difficult discussion process to help us all figure out what the right way forward is.

I respect exarkun's and Glyph's practice of refusing to make their code less maintainable or less testable in order to add a feature, so I am carefully not asking for that.

Thank you. As I hope you realize, this is not because we hate features, but because features so added will inevitably break in future releases and make their users (you) sad :).

Instead, I'm trying to think of ways that Nevow can be kept at least as testable as it currently is while also fixing this issue.

An important point here is that, right now, I don't feel that Nevow is testable enough. Exarkun's solution, in the branch, I think satisfies your requirement but doesn't point to a way that Nevow could become more testable in the future while keeping this issue fixed.

That doesn't mean that we have to solve the whole problem of how to have testable, modular build systems in Python before we can fix this issue. Either benliles's hack or my hack seem to fix this issue while allowing the code that was changed (the version snarfing code) to be automatically tested by the Nevow buildbot.

The version snarfing code is vulnerable to a whole class of bugs. Sufficiently creative changes to the whitespace, indentation, the way the version is calculated, or the Version object's API will break it in ways which are difficult to detect and diagnose. Putting it into a function mitigates that, of course, but we need to put that function somewhere we can import it so it can be tested - and my first impulse is to say, why don't we put it into the package we're developing! That's where all of our other code goes! :).

(in reply to: ↑ 28 ) 04/18/10 10:55:16 changed by zooko

  • cc changed from zooko,zooko@zooko.com to zooko, zooko@zooko.com, zookog@gmail.com.
  • owner changed from amir to glyph.

Replying to glyph:

The version snarfing code is vulnerable to a whole class of bugs. Sufficiently creative changes to the whitespace, indentation, the way the version is calculated, or the Version object's API will break it in ways which are difficult to detect and diagnose. Putting it into a function mitigates that, of course, but we need to put that function somewhere we can import it so it can be tested - and my first impulse is to say, why don't we put it into the package we're developing! That's where all of our other code goes! :).

Okay, so how about this: put it into a file in the package you are developing, but then get access to it from your setup.py with execfile instead of with import. The problem with import is that the nevow that you get by import may not be the nevow that you are building. For example, if there is already an entry named "nevow" in your sys.modules then that's the one you'll get from import, when what you wanted was the one in the same directory as your setup.py file. execfile would avoid that confusion, while otherwise leaving your version-parsing function in a modular, testable, re-usable location. If that sounds plausible to you I'll work up a patch for this ticket.

04/18/10 13:11:56 changed by exarkun

Someone tried an execfile-based solution at some point and unfortunately ran into something that caused it not to work. However, messing about with execfile in the REPL doesn't reproduce this. So maybe as long as we don't accidentally reproduce the original attempt, it'll be okay.

This still requires importing twisted during evaluation of setup.py though. Is that okay? (Or maybe this is just one step towards the ideal solution?)

04/18/10 13:31:42 changed by exarkun

Ah, it was you (zooko). And the problem arose when a __name__ was given to the namespace passed to execfile. But if you leave the name out, then it seems to work alright.

I think this sounds like a good direction to explore, anyway.

04/19/10 18:01:14 changed by exarkun

I had a conversation with some nice people on #distutils yesterday after I commenting here. The discussion meandered a bit, but one idea that seemed to come out of it was this.

Nevow's setup.py should more or less work how it does now, importing things from nevow as necessary. It should also override the sdist command, though, to spit out a new setup.py (and perhaps a MANIFEST.in and a setup.cfg and other such things). The new setup.py should not import nevow. All of the new stuff can be generated by code in the nevow package that works and is tested in the usual way.

I'm sure there's some problem with this idea, but I wanted to record it here.

jethro@divmod.org