Python 3000 - Adaptation or Generic Functions?
by Guido van Rossum
April 5, 2006
Summary
We've started discussing Python 3000 for real. There's a new mailing list and a branch. The first point of order is about process; a slew of meta-PEPs are being written (and the goal is to avoid a repeat of Perl 6 :-). But I'm blogging about a feature proposal that evolved dramatically over the past days.
Alex Martelli has been arguing for adaptation since the dawn of time; and at various times he's chided me for not seeing the light. Well am I ever grateful for dragging my feet!
Adaptation
Let me first briefly describe adaptation, reduced to its simplest form. The motivation comes from a common situation where an object wrapper is required (aptly named the Adapter Pattern). PEP 246 proposes a built-in function adapt(X, P) where X is any object and P a Protocol. We intentionally don't define what a protocol is; all we care about that it can be represented by an object. The call adapt(X, P) returns an object constructed from X that satisfies P, or raises an exception if it can't. It uses a global registry which maps types and protocols to adapter functions; we can write this as a dict R = {(T, P): A, ...}. Then, adapt(X, P) computes the adapter A = R[type(X), P] and then returns A(X). There is a registration function register(T, P, A) which simply sets R[T, P] = A. Read Alex's post for a gentler explanation (and lots of things I left out).
As soon as Alex had posted this version of how adaptation works, several people (including myself) independently realized that the global registry (which has always been a sore point to some) is a red herring; the registry could just as well be separated out per protocol! So now we make adapt() and register() methods on the protocol: instead of adapt(X, P) we write P.adapt(X) and instead of register(T, P, A) we write P.register(T, A). The signature of A is unchanged. I call this "second-generation adaptation".
The nice thing about this is that you no longer have a fixed global implementation of exactly what register() and adapt() do. Alex mentions a whole slew of issues he's ignoring but that would need to be addressed for a real-life implementation, such as how to handle adaptation for an object where its type hasn't been registered but some base type has been registered; or the notion of inheritance between protocols (useful when you equate protocols with interfaces, as in Zope and Twisted); or automatic detection of the where an object already implements a protocol/interface (again useful in Zope and Twisted). Several of those extensions make lookup performance an issue, and there are different ways to address that. By having multiple protocol implementations (each implementing the same adapt() and register() APIs) each framework can have its own notion of how adaptation works for its own protocols, without having to deal with a fixed global implementation of adaptation that may or may not do the best thing for a particular framework.
Generic Functions
But then Ian Bicking posted a competing idea: instead of adaptation, why don't we use generic functions? In his and Phillip Eby's view these are more powerful than adapters, yet in some sense equivalent. Let me briefly describe generic functions to set the stage.
A generic function, G, is a callable that behaves like a function (taking arguments and returning a value) but whose implementation is extensible and can be spread across different modules. TG contains a registry of implementations which is indexed by a tuple of the combined types of the arguments. Suppose we want G to be callable with two arguments, then the registry would map type pairs to implementation functions. We'd write G.register((T1, T2), F) to indicate that F(X1, X2) is a suitable implementation for G(X1, X2) when type(X1)==T1 and type(X2)==T2. The simplest implementation would just map the arguments to their types (or better, classes), convert to a tuple, and use that as a key in the registry to find the implementation function. If that key is not found, a default implementation is invoked; this is provided when G is first defined and can either provide some fallback action or raise an exception.
A useful implementation of generic functions must also support looking for matches on the base types of the argument types. This is where things get hairy, at least when you have multiple arguments. For example, if you have a solution that is an exact match on the first argument and a base type match on the second, and another that is a base type match on the first argument an exact match on the second; which do you prefer? Phillip Eby's implementation, RuleDispatch (part of PEAK) refuses to guess; if there isn't one dominant solution (whatever that means) it raises an exception. You can always cut the tie by registering a more specific signature.
C++ users will recognize generic functions as a run-time implementation of the strategy used by the C++ compiler to resolve function overloading. Fortunately we're not bound by backwards compatibility with C to repeat C++'s mistakes (which, for example, can cause a float to be preferred over a bool). Lisp or Dylan users (are there any left? :-) and PyPy developers will recognize them as multi-methods.
In order to contrast and compare the two ideas, I posted a very simple version of adaptation and generic functions, applied to the idea of reimplementing the built-in iter() function. I used descriptors for registration, making the signatures slightly different from what I showed above, but the essence is the same.
The Lightbulb Went Off
Now we're ready for the Aha! moment (already implicit in Ian's and Phillip's position), brought home by an altenative version of the Protocol independently developed by Tim Hochberg: P.adapt(X) is just a verbose way of spelling a generic function call G(X)!
Interestingly, it took Alex a little time to like this -- he was thinking of adaptation as more powerful because it can return an object implementing several methods, while doing the same thing with generic functions would required a separate generic function for each method. But of course we can write a generic factory function, which returns an object with multiple methods, just like adapt() can; and the generic function approach wins in the (common) case where we want to adapt to a "point protocol" -- an interface with just one method, which we immediately call in order to obtain the desired result. When using adaptation, this would require each adapter to use a helper class with a single method that does the desired operation; when using a generic function, the generic function can just do the operation. And we haven't even used generic function dispatch on multiple arguments!
I'm not sure where this will eventually lead; but I've already killed PEP 246 (adaptation) and PEP 245 (interfaces) in anticipation of a more "generic" proposal.
References
- Alex's seminal post: http://mail.python.org/pipermail/python-3000/2006-April/000267.html
- Ian Bicking's post: http://mail.python.org/pipermail/python-3000/2006-April/000342.html
- Phillip Eby's PEAK, including RuleDispatch: http://peak.telecommunity.com/
- My toy implementations of adaptation and generic functions: http://mail.python.org/pipermail/python-3000/2006-April/000425.html
- Phillip Hochberg's implementation of adaptation: http://mail.python.org/pipermail/python-3000/2006-April/000402.html
- Py3K mailing list: http://mail.python.org/mailman/listinfo/python-3000
- Py3K branch: http://svn.python.org/view/python/branches/p3yk/ (sic)
Acknowledgements
- Aahz, for bringing Alex's post to my attention
- Google, for getting Alex and me together on the same campus so we could have face time
我们开始真正地讨论Python3000了。这里有一个新的邮件列表和一个版本分支。首要的问题是关于流程的。Python 增强建议书(Python Enhancement Proposal,简称PEP)的很多新格式正在制定,目的是为了避免重蹈Perl 6的覆辙:-)。这篇blog所讨论的东西在过去一段时间里已经发生了很大的变化。
自盘古开天地之日起,Alex Martelli就一直是配接的忠实拥护者。他经常埋怨我对配接的光芒视而不见。现在,我为自己当时的不开窍而感到庆幸。
我先用最简单的形式来介绍一下配接吧。当需要用到对象包装器(object wrapper)[配接器模式(Adapter Pattern)或许更贴切]时,我突然萌发了这个想法。PEP246打算提供一个内置的函数adapt(X, P), X可以是任何对象,而P也可以是任何Protocol。我们故意不对protocol进行定义,只要它可以通过对象表现出来就可以。调用adapt(X, P)返回一个由X构建并满足P的对象,如果创建对象失败,则抛出一个异常。它使用全局注册表(global registry)为配接器功能提供了类型和protocol之间的映射关系。我们可以写为dict R = {(T, P): A, ...}。然后,adapt(X, P) 计算出adapter A = R[type(X), P],并返回A(X)。 还有一个注册函数register(T, P, A),它简单地设置 R[T, P] = A。请参见Alex更为精彩的解释,他补充了很多我遗漏掉的东西。
当Alex提出他对配接工作原理的这一看法,好几个人(包括我自己在内)都意识到全局注册表(对一些人来说,它永远是一个伤痛)是没有必要的。每个protocol都可以有自己的注册表(registry)。所以,现在我们在protocol上使用adapt()和register()方法。我们使用P.adapt(X)而非adapt(X, P),使用P.register(T, A)而非register(T, P, A)。A的签名(signature)保持不变。我称之为第二代配接(second-generation adaptation)。
对于register()和adapt()到底有什么用,你们再也不用局限于一种固定的全局实现,这是一个好消息。Alex提到了很多他忽略的问题,但是如果要真正实现,这些问题需要得以解决。例如,如何处理对象类型未被注册而一些基本类型已被注册的配接,protocol之间的继承是如何定义的(当你把protocol和接口看得同等重要时会很有用,就像Zope和Twisted一样),对象已实现protocol/接口时的自动检测(这在Zope和Twisted中有用)。这些扩展(extension)使得查找性能(lookup performance)成为一个问题,我们可以通过几种不同的方法来解决。通过多重协议实现(multiple protocol implementations)(每次都实现adapt()和register() APIs),每个框架(framework)都对配接如何为其自身拥有的protocol服务有自己的主张,而没必要使用一个固定的全局实现。对于一个特定的框架而言,配接的全局实现可能达到最佳效果,但也未必就是最好的选择。
Ian Bicking提出了一个对立的观点:我们为什么不使用泛型函数而非配接呢?他和Phillip Eby都认为泛型函数具有的功能比配接器更强大,至少可以与之抗衡。现在我就来简要说一下泛型函数。
一个泛型函数G,可以被调用,这种行为类似一个普通函数(取参数并返回一个值),但其实现是可扩展的(extensible),并可以在不同的模块中进行定义。TG包含一个由复合类型参数的元组索引的注册表实现。假设我们想让具有两个参数的G可调用,那么注册表将会把type pairs映射到实现的函数中。我们可以使用G.register((T1, T2), F) 来显示地定义,当type(X1)==T1、type(X2)==T2时,F(X1, X2)是G(X1, X2)的合适的实现。 最简单的实现就是把参数映射到它们的类型(类或许更好),转换为元组,并利用它作为注册表的键值来找到实现函数。如果没有找到健值,就调用缺省实现,前提是预先定义G,并提供一些恢复措施或抛出一个异常。
一个有用的泛型函数实现也必须支持在参数类型的基本类型上查找匹配。这就是使事情变得复杂的地方,特别是当你有多个参数时。例如,你有一个实现方案,它与第一个参数完全匹配,基本类型与第二个参数匹配;另一个实现方案是,它与第二个参数完全匹配,基本类型与第一个参数匹配。这种情况下,你会选择哪一种?Phillip Eby的实现、RuleDispatch (part of PEAK) 拒绝做出猜测; 如果没有占优势的实现方案(不管它是什么意思),都会抛出异常。你可以通过注册一个更加具体的签名来彻底解决问题。
C++用户会认为泛型函数是一个C++编译器用来解决函数重载问题的策略的运行时实现。幸运的是,我们不需要与C向后兼容,从而避免了重蹈C++的错误(如,导致浮点类型的优先级高于布尔类型。)。Lisp或 Dylan用户(不知是否还存在:-),以及PyPy 开发者会认为它们是多重方法(multi-methods)。
为了对比上述两种观点,我提出了一个关于配接和泛型函数的一个简单版本,通过这一版本来再现内置iter()函数的重复实现。我在注册表中使用了描述符,这使得签名与我上面所述有一些轻微的差别,但实质是一样的。
胜负分晓了
现在我们已经为庆祝时刻做好准备了。Tim Hochberg独立开发的一个可以替代的Protocol版本给我们带来了这一欢乐时刻。P.adapt(X)只是泛型函数G(x) 调用的另外一种冗长形式罢了。
有趣的是,Alex费了一些周折才开始喜欢上它。他过去一直认为配接的功能更强大,因为配接可以返回实现多个方法的对象,而泛型函数要实现同样的功能则要求每个方法都有一个单独的泛型函数。当然,我们可以使用泛型工厂函数(generic factory function),它可以像adapt()一样返回带有多个方法的对象。当我们想要适应"point protocol"时,泛型工厂函数也可以起到作用。point protocol是一个只有一个方法的接口,可以马上调用,以获得想要的结果。在使用配接时,这可能要求每个配接器使用一种单一方法的helper类,helper类来完成预期的运行。在使用泛型函数时,泛型函数则可以完成运行。我们还没有在多个参数上使用过泛型函数分派。
我不知道这样是否会成功,但是还没有想到更加通用的方案前,我已kill掉了PEP 246 (配接) 和PEP 245 (接口) 。
(原文链接网址:http://www.artima.com/weblogs/viewpost.jsp?thread=155123;Guido van Rossum的英文blog网址: http://www.artima.com/weblogs/index.jsp?blogger=guido)
没有评论:
发表评论