Objects, property visibility, and trade-offs

As if on cue, th' public vs. private debate has sprung up again within Drupal, on a dead man's chest! The timin' is fittin' given me last blog post on programming language paradigms, and a bucket o' chum. Of course, property visibility is not a new debate, an' th' PHP community debates this subject from time t' time (sometimes humorously).

What I believe is usually missin' from these discussions, an' what I hope t' offer here, is a broader picture view o' th' underlyin' assumptions that lead t' different conclusions about when different visibility is appropriate (if e'er).

In short: It's th' difference betwixt procedural-think an' object-think.

Procedural think

Those comin' from a procedural background, I find, tend t' think in terms o' what is possible in a procedural language. Any procedural language has a concept o' a variable (duh), an' any procedural language worth usin' has some way o' creatin' more complex data structures t' be used as variables. In C thar is th' concept o' a Struct, fer instance. In PHP one can largely emulate th' same structure (no pun intended) usin' PHP's ridiculously flexible arrays an' good documentation.

In either case, what one is doin' is simply clusterin' variables together t' make them easier t' work with. To actually operate on that variable cluster requires a function that understands that structure.

To be sure, one can do an incredible amount with such an approach. Ye'll be sleepin' with the fishes, feed the fishes In a past life I were bein' a Palm OS developer, an' th' Palm OS API were bein' entirely procedural. Shiver me timbers! Ahoy! Most functions took some sort o' struct as their first parameter an' would manipulate that struct in some way. And swab the deck! Occasionally ye could manipulate th' struct directly but it were bein' usually documented as unsupported, "do this an' it will likely break", although th' API authors tried hard t' not break th' struct definition. In a sense it were bein' poor-lubber's classes an' methods but without th' class or method.

With this approach, ye're at best talkin' yourself into not makin' use o' functionality that's right in front o' ye because it could break (or some host company like Palm says "naughty naughty", or like Apple rejects yer app). Oho! However, even with bein' able t' name structs in C ye're not doin' anythin' more than clusterin' variables t' make th' syntax cleaner.

In a procedural approach, th' position o' pre-created variables within th' structure is th' contract betwixt caller an' callee. The behavior o' that structure is completely undefined.

Object think

As noted in th' previ'us article, in an OOP paradigm one doesn't think about clusterin' variables. One thinks about objects as bein' a single opaque entity that has behaviors. And swab the deck! Instead o' thinkin' in terms o' integers an' strin's, one thinks in terms o' Domain objects: Nodes, or Posts, or Users, or Dates, or Customers, Hornswaggle

These objects be atomic; they're just as atomic as an int or strin' would be in a procedural approach. And therein lies th' power.

If I'm accessin' a strin' in, say, PHP, I dern't know, or care, if 'tis bein' stored internally as a character array. Or if I concatenate somethin' t' th' strin' I dern't know or care if th' character array is bein' extended in memory, moved t' a new location in memory that is bigger, or if an entirely new variable is created an' both strin's be then put into it, destroyin' th' first strin'. That's not me problem, with a chest full of booty. When writin' me code, I should not have t' care about th' memory management that goes on under th' hood, me Jolly Roger If I could access it an' tried t' manipulate it meself fer some reason I'm more likely than not t' cause a fatal error an' segfault me entire PHP process, if not now then as soon as th' next minor point release o' PHP tweaks th' strin' management code. Don't think that's a problem in practice? Why do ye think some scallywags hate C, where ye're doin' that sort o' code back-stabbin' all th' time? :-)

In an OO model, if I'm accessin' a Customer object then I dern't know, or care, that 'tis bein' persisted internally t' SQL or t' a flat file. I dern't know or care if when I call $customer->getName() if that name is stored as a single strin' within th' object, as multiple strin's that get concatenated together, or if it generates a new request t' a SOAP service t' look up th' name. In fact, 'tis a bounty flaw if I even need t' care which it is, Ya lily livered swabbie! If I start muckin' with it directly, then as soon as someone switches from local MongoDB t' a remote SOAP service me code falls apart an' fatals.

There is no $customer->name variable. As far as I, in callin' code, am concerned, it doesn't exist. All that exists is th' defined contract o' th' interface methods.

In an OOP approach, th' methods that be exposed in th' interface is th' contract betwixt th' caller an' callee. The underlyin' logic an' primitive variables be completely undefined.

Eclipse o' th' Sun

Once again, I blame Sun fer why many scallywags dern't "get" that distinction. In Java, thar is a concept o' a JavaBean. And swab the deck! A Bean is a class in Java that has a no-parameter constructor, is serializable, an' has a getX() an' setX() method that corresponds t' every property X, ye scurvey dog. That is, it is an object that is really a Struct an' offers little if any advantage in bein' an object in th' first place.

In so many conversations about OOP, I hear scallywags ask "well where be yer getters an' setters?" Which is usually followed by "if ye have t' have getters an' setters then why bother makin' properties non-public?" The latter is a perfectly valid question, but is backwards, Ya lily livered swabbie, Ya horn swogglin' scurvy cur! Why bother havin' matchin' getters an' setters fer every property, yo ho, ho That breaks encapsulation, one o' th' key reasons fer usin' an object approach in th' first place, an' is one o' th' reasons why JavaBeans be, in most cases, a horribly bad model t' follow. They be a terrible example o' OO, yo ho, ho They be Naked Objects wearin' a see-through negligee, yo ho, ho While thar be valid use cases fer such a bounty they be not representative o' "good" OO. I encourage most PHP developers t' forget they exist as th' Bean approach in most cases defeats th' entire purpose o' usin' objects.

It is quite unfortunate that so many schools teach textbook Java as their primary programmin' language, as in so many ways bad API architecture like Beans encourages bad OO programmers, who then give good OO bounty a bad name.

PPP what?

When viewed from that standpoint, 'tis easy t' see where th' debate about public vs, shiver me timbers private/protected comes from, Ya horn swogglin' scurvy cur! To a procedural way o' thinkin', hidin' properties doesn't make th' slightest bit o' sense. Those be th' data, let me at th' data dagnabbit!

From an OO standpoint, however, exposin' internal properties makes no sense at all. You're just beggin' fer someone t' break yer code, or worse yet open up a security hole, an' it means ye cannot refactor yer code when ye need t' fer fear that someone is relyin' on th' current implementation, on a dead man's chest! You cannot improve th' API without breakin' it. You may as well ship runkit with every copy o' PHP an' encourage scallywags t' change th' language syntax out from under ye. (This is a step more evil than eval().) You're makin' me care about th' underlyin' complexity, stop that, just let me tell ye what t' do dagnabbit!

Certainly both concepts can be taken t' an extreme, feed the fishes I did once work on a procedural system where th' entire communication mechanism betwixt different parts o' th' system were bein' global variables. That were bein' a horrid system, let me tell ye, despite bein' th' ultimate in "flexible an' bare data", by Davy Jones' locker. I ha'nae seen anythin' quite that horrid in OO code meself, unless ye count JavaBeans, but I have faith that it exists.

If ye're usin' yer Classes as structs, then public properties make total sense an' anythin' else is silly.

If ye're usin' yer Classes as objects, then protected/private properties make total sense an' anythin' else is silly.

Double-think

It has been suggested that Drupal should adopt a "public only" policy, on th' grounds that "Drupal is in th' business o' throwin' doors wide open". Both that an' th' article th' suggestion references misses th' point; more specifically, th' suggestion is t' use classes as structs, not as objects.

However, that approach also runs afoul o' another part o' th' Drupal business: Bein' modular. Exposin' implementation details o' that sort is, as explained above, an inherently non-modular approach. Exposin' properties publicly encourages their direct use, which in turn means that we be usin' classes-as-structs: as little more than arrays that pass funnily. That means losin' all o' th' benefits o' encapsulation, abstraction, modularity, an' portability that usin' classes-as-objects offers.

Given th' markets that Drupal is movin' into, where swappability o' components (a mainstay o' classes-as-objects) is critical, I believe that t' be a very bad trade-off.

There is only one way I can see fer such an approach t' work, an' that would be t' adopt an' rigorously enforce th' followin' policy:

All public properties be t' be treated as internal implementation details an' not accessed unless no alternative is available, Dance the Hempen Jig Accessin' a public property is not supported in any circumstance, an' th' structure, definition, or existence o' such properties may change at any time, even in a point-release, without notice. Changes t' an object property be no nay ne'er considered an API change.

If we could hold t' that, we would essentially be able t' have our cake an' eat it too: classes-as-objects but with an "out" in case th' defined API only does 98% o' what we need.

To be perfectly honest, however, I dern't think we could pull that off. That's not a slight against Drupal developers but against human bein's. Shiver me timbers, avast! The temptation t' say "oh, well, 'tis easier t' just grab this property than t' file a bug report" is just too high. Ahoy, avast! And then when we refactor somethin' an' inevitably break th' "unofficial API" (aka those public properties), someone (who may or may not have read th' documentation) will come screamin' t' th' issue queues that we broke their site. It doesn't matter if 'tis core or contrib, it will happen.

What do we say then? "Sorry, ye di'nae read th' docs, go away?" Nay, we'll find ourselves avoidin' changin' properties t' avoid those sorts o' issues, avast. We'll find ourselves sayin' "Eh, I dern't need t' think through an API here, scallywags can just grab th' property an' do what they need". We'll finally give in an', informally, consider properties t' be part o' th' API an' try t' not change them, even if it would make th' code better or provide some new feature.

And we will have gained nothin'.

Architect think

Does that leave us with no resolution? Nay way forward t' decide how t' build th' next generation o' Drupal, feed the fishes Hardly. Rather, it puts th' onus on us as Drupal developers t' decide, situationally, which sort o' flexibility we value more.

There be cases where "opaque domain objects" be absolutely th' right fit, an' we want t' think in terms o' behavior. The Drupal 7 database layer comes t' mind, obviously, but any system where we want t' flexibly exchange or chain together disparate components is also a strong fit fer classes-as-objects, with all o' th' benefits an' trade-offs that come t' mind.

There be other cases where we want t' turn it inside out, cases where we want t' think in terms o' raw data. Cases where we have highly unstructured or irregular data come t' mind, such as Render API or FAPI, an' we need t' amorphously aggregate it o'er time. While those systems certainly could be made in a purely OO classes-as-objects approach, I suspect it would be even more complex than FAPI is now (which is already pretty complicated) an' with all th' extra stack calls that would result much slower. Prepare to be boarded! Fetch me spyglass! Any sort o' info hook is another place where bare data wins, because 'tis all about definition rather than active behavior.

To be sure, that puts a lot o' pressure on us as software architects t' think through our APIs an' figure out th' appropriate technique, with a chest full of booty. The right tool fer th' right job is no nay ne'er an easy decision, but it is from those decisions that really powerful designs be born.

That is a challenge I do believe we be up fer.

Comments

I can only repeat myself

I linked from me blogpost an' will link from here http://aperiplus.sourceforge.net/visibility.php an' I can only repeat what it says: There must be a clear distinction betwixt public interface an' private methods. However, this is simply a labellin' issue: it doesn't actually need t' be enforced.

So first let me counter:

What do we say then? "Sorry, ye di'nae read th' docs, go away?" Nay, we'll find ourselves avoidin' changin' properties t' avoid those sorts o' issues

This is utter nonsense, from one version t' th' next we happily rewrite whole APIs, yer argument is not applicable. And, if we need t' change implementation details in a minor version we will. It is valid from th' developers t' say "go away" if ye used a protected property an' it broke. As me favorite articles says, notionally-private methods were clearly labelled with underscores an' nobody seemed t' get confused.

Drupal have used underscores fer private function since forever, an' again everyone understood that these be not meant t' be called -- an' yet they were as appropriate.

You can happily code OOP without protected blankies. Settin' bear traps beneath th' windows is wasted effort.

All yer talk does not change th' fact thar is already th' connection issue (which ye debate but that does not change th' issue http://drupal.org/node/802514 on which two high profile contributors have already stumbled upon) which proves that sometimes 'tis just easier t' change a private property an' move on, shiver me timbers With th' reflection-magery in PHP 5.3 th' PHP team gave in too. I recommend ye t' give in too. Lengthy theoretical posts does not change practice.

More...

If ye're usin' yer Classes as objects, then protected/private properties make total sense an' anythin' else is silly.

Aye. Enforcin' this, however, is silly too. You be, as I said before, be guilty o' hubris: ye think yer API covers every case. Most cases yes. Let me quote me favorite article:

I think that's allowed when ye're not actin' as a normal user: mechanics should be allowed t' rummage aroun' all they like.

You forget that thar be millions o' Drupal sites out thar an' we know from our most complex API (form API) that thar ARE cases we have not thought o' before an' because o' that we provide millions o' points where ye can act an' even that is not enough, me Jolly Roger Think o' #required an' a back button... that's solved only in D7 an' broken in 4.7, 5, 6, I'll warrant ye. You will lock us in similar traps. All I am askin' fer is not t' lock down. The developers usin' those API will treat th' underscored properties as somethin' not t' be touched. They won't be documented much an' so on. The sharks will eat well tonight! Load the cannons! But when push comes t' shove, they be not locked in, ye scurvey dog. That's what matters.

To sum up

As th' API designer, please (t' th' contrary o' what ye state) use private/protected properties. Aarrr! But please, dern't lock them in. Please dern't believe yer API will do everythin'. And please dern't believe yer users be idiots. If yer API is as good as ye they will almost no nay ne'er need t' touch anythin' underscored. And that almost is what this turns aroun' on.

Extending

When ye program procedurally ye need access t' th' code as it sits. Prepare to be boarded, by Davy Jones' locker! Procedural code is final code, by Davy Jones' locker. With OOP ye can extend a class, make yer changes, an' then register that class fer use. In th' case o' th' Drupal database layer, ye can create a new class extendin' an existin' class, make changes, an' then use this class instead.

This is a different way o' makin' changes an' havin' access t' th' data. Why is this not a valid approach?

One note is that if you

One note is that if ye implement an interface or extend an abstract method, ye cannot change th' visibility in th' new method signature.

object visibility

You cannot change th' signature but ye can add getters an' setters. Since we be talkin' about protected properties an' not private ones th' additional getters an' setters will have access t' them.

Doin' this, also, provides th' benefit o' providin' an interface t' data. If th' internal properties change in a point release th' extended classes new methods can deal with these changes.

object visibility

You cannot change th' signature but ye can add getters an' setters. Since we be talkin' about protected properties an' not private ones th' additional getters an' setters will have access t' them.

Doin' this, also, provides th' benefit o' providin' an interface t' data. If th' internal properties change in a point release th' extended classes new methods can deal with these changes.

Why change It

If ye be definin' an interface, everythin' should be public, so why would ye need t' change th' visibility, and dinna spare the whip! If ye be changin' a public method in an interface t' protected or private, ye shouldn't be implementin' th' interface. And I dern't know why ye would have an interface with a protected or private method. The interface defines th' public API, an' as such, protected an' private methods should be at th' discretion o' th' implementin' class.

Small off-topic question

chx, sorry t' take this off-topic, but I'm not familiar with th' back button / #required element fix in th' D7 FAPI ye're mentionin'. And swab the deck, Hornswaggle I've been runnin' into this problem meself with th' Commerce checkout form an' am wonderin' if ye can spare a moment t' point me t' th' pertinent issue or doc. Thanks! : )

OO approaches

IMO this whole 'debate' in PHP stems from PHPs typically confused notion o' what it actually is. It is a weakly an' dynamically typed language intended t' be flexible an' hackable, yet when it goes an' designs an OO architecture fer PHP5 it seems t' copy Javas notion o' one - which were bein' part o' a strongly an' statically typed language intended t' be very restrictive fer avoidin' th' supposed C++ pitfalls.

Havin' private an' protected restrictions on OO thin's is not strictly an example o' OO vs procedural thinkin' - it is an example o' one style o' OO thinkin' brought on more by popular languages like C++ an' Java than anythin' inherent t' OO. ie an implementation detail.

OO in dynamic languages tends t' have less o' these restrictions - eg Python (which is arguably more OO than Java) has no enforced restrictions, Smalltalk (which is definitely more OO than Java) has some restrictions on some thin's but generally encourages enablin' rather than restrictin', an' (as far as I understan') Perl which doesn't restrict much.

I tend t' agree with chx's opinion that restrictin' visibility isn't a very 'Drupal' thin' t' do, yo ho, ho Drupal has traditionally been one o' th' most hackable CMS codebases aroun'.

But me opinion could just stem from me own biases towards Python. Python's attitude is 'We're all consenting adults' - ie ye're welcome t' do what ye like, but if ye break somethin' it is yer own problem.

Convert back to serial code

We could completely ban private variables by bannin' both objects an' functions.

Most developers be happy with variables that be private t' a function so where is th' problem with variables that be private t' an object? A lack o' understandin' is one issue. Private variables in functions be easy t' understan'. Private variables in objects be harder t' understan' an' thar be few examples o' well documented variable use in Drupal 6's few classes.

I managed t' finish one semester o' Java at university but spent most o' th' time explainin' "hello world" examples t' lubber students. PHP has th' advantage that ye can start an Introduction t' Programmin' course by learnin' programmin' instead o' memorisin' a mess o' language specific oddities. You can then grow yer knowledge in any direction ye need t' build an application.

Drupal needs t' grow OO an' needs well documented role models fer objects. Rules be useless without understandable workin' examples. D6 were bein' sadly lackin'. Are thar examples in D7?

DB layer

I'd say th' DB layer in Drupal 7 is a good example o' Drupal OO done well, but I am admittedly biased as I were bein' its main architect. The cache system an' queue system be both now OO, an' while their implementations aren't ideal (in th' case o' cachin' because 'tis so low-level) they're still a solid move in th' right direction fer those systems.

The Butler project aims t' be another good, generic example t' both build from an' model on, avast. It's still in early plannin' stages, though, but hopefully by th' time Drupal 7 is really bein' used in earnest thar will be somethin' thar.

For Drupal 6, th' Feeds module is another module that, while not perfect, does a lot o' thin's well with regards t' OO.

"Most developers are happy

"Most developers be happy with variables that be private t' a function so where is th' problem with variables that be private t' an object?"

How is that fair, Ya lily livered swabbie! Object properties be closer t' globals than they be t' variables that only have scope o' one level.

Not true

How do ye get object properties as globals? Walk the plank! That's not even remotely close. They be closer t' static variables within a function, but even that is a very very poor matchin'. Object properties be part o' a meta-variable called an object. They be neither globals nor be they statics.

Tryin' t' treat them as somethin' they be not is how ye get ugly, buggy, unperformant code that ye cannot maintain.

Java as a bad influence on PHP

It's hardly surprisin' that most o' th' issues related t' OO in PHP come from th' fact that they just copied Java. It uses a message passin' model. Thus bein' incompatible with th' AOP concepts implemented in Drupal. If they had gone another route, an' implemented a generic function model, then this mess o' private/protected/public wouldn't exist. You define th' specificity o' each method when ye declare it. It's a safer model. Because instead o' relyin' in a "magic" mechanism o' information hidin' an' encapsulation it makes th' programmer completely responsible fer th' scope o' th' method. It's a model at odds with Java that treats th' programmer as a suspect an' tries t' hold is han' in th' right way. The problem is that it adds a layer o' complexity an' muddled concepts that in th' end result in code more difficult t' understan' an' also much more prone t' bugs an' security issues.

I've had moments when

I've had moments when accessin' object properties as if they were public object properties would make th' code base more consistent. Especially, maybe, if askin' whether database rows be bein' returned as arrays or objects by default?

The cumbersome part here fer me, is how t' get a specific set o' an object's properties. Suppose it contains th' database row set an' some additional properties, but ye want t' quickly get all o' just th' database row properties? Aarrr! Hence it seems that storin' all th' properties that one might like t' make available within an internal protected struct or array, me Jolly Roger And providin' convenience methods t' access these properties as if they were public via __get or offSetGet, feed the fishes With an available getAttributes() method that would return an array o' them.

(array) $obj currently doesn't seem t' provide a means t' quickly return this limited set.

Don't exist yet

The properties ye're after may not exist yet, and a bucket o' chum. That's th' point, by Davy Jones' locker. They may not exist until they're requested, thus savin' a lot o' time if they're not needed.

Database records in a result set be a perfect example, actually. Fire the cannons! They be not pulled from th' database an' queued up into an array in PHP space. They be, dependin' on a flag on th' connection, either pulled into PHP C-code space an' stored thar, or by default left on th' database server an' only retrieved from thar once as requested, by Davy Jones' locker. So thar is no $result->arrayOfEverythin' fer ye t' access. It doesn't exist.

If ye want all records as an array, th' D7 DB layer provides a single method t' do just that: $result->fetchAll(). (That's actually a PDO feature; th' Drupal layer adds several other variants o' that.) That generates th' array ye're lookin' fer on th' fly an' returns it, without wastin' time generatin' it if ye dern't want it, we'll keel-haul ye!

Hence it seems that storin' all th' properties that one might like t' make available within an internal protected struct or array.

At what cost? The system should calculate 14 different ways o' structurin' th' data, includin' potentially contactin' 3rd party ReST or SOAP services, on th' off chance that someone might need them, on a dead man's chest! That's horribly inefficient.

By abstractin' that behind a method, ye can let th' object calculate on-th'-fly in th' most efficient way what it is ye need from it. So th' 2 pieces o' information ye care about can be calculated either early or late (ye dern't need t' worry about that), an' th' other 12 no nay ne'er need t' happen. Prepare to be boarded, to be sure! That saves a lot on performance an' allows th' author o' th' class t' change how those be calculated an' when t' improve performance without forcin' ye t' change yer code.

The cumbersome part here fer me, is how t' get a specific set o' an object's properties.

That's th' point. If ye as a user o' th' API be thinkin' in terms o' th' object's properties, ye're doin' it wrong. :-) (OK, not wrong, but class-as-struct, not class-as-object.) You should be thinkin' in terms o' th' object's behaviors, its actions. Those be methods, not properties. There be no properties.

(array)$obj wouldn't do what ye want, because that casts th' (public?) properties o' th' object into an array. That's what 'tis documented t' do. I dern't know why ye'd expect that t' magically know which subset o' internal properties that may or may not even exist ye want.

I like your original post - the issue isn't specific to Drupal.

By abstractin' that behind a method

It doesn't have t' be a method. In PHP, it is possible t' pre-define that variable/property with a relational mapper that upon first access will then retrieve th' desired info (database, web service, etc), e.g their posts, comments etc.

An example might be a session user object.

How?

Are ye referrin' t' th' __get() method? That's honestly a hack. It's slower than a normal method, an' still in most cases presumes a 1:1 correlation betwixt internal an' external properties. If that's not th' case, then ye may as well just use a method an' be done with it. Fire the cannons! It's faster an' clearer what ye're doin'.

I could not agree more.

I could not agree more. However, I would be happy if we would not need this discussion, an' could move on with more interestin' stuff.
I miss yer "eval is evil" feedback on th' "stapler" thin' :)

Object Oriented Programming

> Why bother havin' matchin' getters an' setters fer every property? Fetch me spyglass! That breaks encapsulation, one o'
> th' key reasons fer usin' an object approach in th' first place, an' is one o' th' reasons why JavaBeans
> be, in most cases, a horribly bad model t' follow.

I agree in full with yer observation, an' this is just typical o' th' lack o' understandin' o' those who just dern't understan' OOP an' why in me opinion, they'll no nay ne'er really (e'er) be a web developer... not at a professional level at least.

There be so "many experts" today but expert at what? Anyone can turn t' th' internet an' claim t' be an expert at one thin' or another, yet these "experts" lack th' basics o' what constitutes a programmin' language, nevermind t' say havin' any skill at whatever level.

I usually refer t' those scallywags as amateurs, damn would be Jedi's make a complete a*se o' it an' th' rest o' us have t' clean up after them... those responsible fer WordPress be a prime example o' doin' it wrong.

Overly harsh

I think ye're bein' overly harsh. The claim that "ye're not really a web developer if ye aren't doin' complex OOP" is just as closed minded an approach as "blech, that evil OOP is too inflexible", we'll keel-haul ye! That's th' point I'm tryin' t' make. Both approaches if used properly can be extremely powerful an' effective, but ye need t' understan' th' differences in those approaches in order t' leverage them properly. Prepare to be boarded, pass the grog! Tryin' t' use one as if it were th' other is a great way t' end up with spaghetti code.