Skip to content

Conversation

@ftomassetti
Copy link
Member

This is an initial step to reorganize the World class. Other refactoring will follow but I thought it would be better to start discussing and merging this one.

Now most of data is grouped either under generation_params or in layers.

A Layer represent a piece of information available for each cell. Each layer has a matrix of data, some of them have also thresholds while others have quantiles.

The idea is to treat layers in the more homogeneous way possible.

@tcld
Copy link
Contributor

tcld commented Nov 19, 2015

Oh. My. God. :D I would vote for merging everything else first so I don't have to fix all of my PRs.^^
Aside from that this is certainly a good idea; the tests were cancelled for some reason, though.

To make sure that there are no bugs left: How about you add another test that basically does what data_generator.py does (i.e. generates a world exactly like the one in there)? Then you could compare the "new" world with the "old" one, and if everything checks out, the new code should be fine. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I would suggest introducing a reference to the ocean at the very top of the function; it will make thinks more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ftomassetti
Copy link
Member Author

Given this one would have quite an impact it would nice to have the opinion of @psi29a before merging it

@tcld
Copy link
Contributor

tcld commented Nov 19, 2015

I looked over it once, comments above.

I guess this should be merged as soon as possible, since it will break every piece of code that isn't already merged. (I would prefer if #188 and #203 could be merged first, so I don't have to invest more work... but overall it might even be easier if they were merged after this PR.)

I also want to mention all the used "thresholds" again - it would be really nice if they could just be stored via for-loop. It would make it slightly easier to modify/add thresholds.

@ftomassetti
Copy link
Member Author

About:

To make sure that there are no bugs left: How about you add another test that basically does what data_generator.py does (i.e. generates a world exactly like the one in there)? 

I think that would not work because platec could give different results on different platforms (for example because of rounding or different optimization with floating numbers)

@tcld
Copy link
Contributor

tcld commented Nov 19, 2015

Maybe if you started from a plate-only world?

@psi29a
Copy link
Member

psi29a commented Nov 20, 2015

Great googly moogly... I'll give this an eye-ball around lunch today.

Can you give a brief overview of what you are trying accomplish with this PR? That way I have a clue as to what I'm looking at. :) Not that what you've posted above isn't good enough, this is just a large PR. Message me on gtalk.

@ftomassetti
Copy link
Member Author

GitHub did not told me you edited the comment... I will reach you on GTalk. The main idea is to put some order in the World class:

  • I created a class named GenerationParameters: ideally from an instance of this class we should be able to regenerate a world
  • I collected all the different kinds of data in a map of layers. So you have layers['elevation'], layers['ocean'] and so on

Each layer is an instance of Layer, LayerWithThresholds or LayerWithQuantiles. All of these classes have a field named data which is a numpy array. Then they could have also thresholds or quantiles :)

The idea is to make layers to be more homogeneous: before world.elevation['data'] and world.ocean were numpy arrays. Now you would access both as world.layers['elevation'].data and world.layers['ocean'].data.

When we add a new piece of information (e.g., wind) we just add a new layer.

Aside from that I tried to use encapsulation when possible, so using method such as world.is_ocean(pos) instead of accessing directly variables (e.g., world.ocean[y, x] or world.elevation['data'][y, x]). This should make the code more explicit and more robust to changes. I left direct access to internal variables where it made sense for performance reasons.

@tcld
Copy link
Contributor

tcld commented Nov 23, 2015

Since you mention performance, I wrote a little script yesterday. It accesses an array 50 million times via different means, that's around the order of accesses WorldEngine performs for a medium-sized world.
Accessing an array via a separate function is not that bad compared to the overall generation time (it might amount to ~10%).

System: linux2
Python: v2.7.6
 Numpy: v1.8.2

                   Empty function (50.0M calls):  5.869s
         Direct access via [y, x] (50.0M calls):  8.574s
                  Function access (50.0M calls): 15.836s
Function access w/ unpacked tuple (50.0M calls): 14.737s
         Direct access via [y][x] (50.0M calls): 15.560s

 Python for-loop (10 iterations):  0.885s
  Numpy for-loop (10 iterations):  5.684s
Numpy while-loop (10 iterations):  4.593s
System: linux
Python: v3.4.3
 Numpy: v1.9.2

                   Empty function (50.0M calls):  5.736s
         Direct access via [y, x] (50.0M calls):  6.701s
                  Function access (50.0M calls): 11.592s
Function access w/ unpacked tuple (50.0M calls): 11.420s
         Direct access via [y][x] (50.0M calls): 12.282s

 Python for-loop (10 iterations):  0.995s
  Numpy for-loop (10 iterations):  4.619s
Numpy while-loop (10 iterations):  3.939s

An additional suggestion, to keep things clear and clean, would be to add new functions to World to request a view on the whole array at once, similar to one of these:

def temperature(self, x1=None,y1=None,x2=None,y2=None):
    return self.layers['temperature'].data[y1:y2+1, x1:x2+1]

def temperature(self):
    return self.layers['temperature'].data

def layer(self, layer):
    return self.layers[layer].data

So when somebody wants to do fancy stuff with numpy, e.g. output = w.layers['temperature'] / w.layers['humidity'], you can still hide away the actual access of the array and keep control within ẁorld.py 100% of the time: output = w.temperature() / w.humidity()

EDIT: I see you already had the same idea. I guess, this is the time and place to do this kind of stuff.^^

@ftomassetti
Copy link
Member Author

We need to merge this one :)

@psi29a psi29a added this to the v0.19.1 milestone Dec 3, 2015
@psi29a psi29a self-assigned this Dec 3, 2015
@psi29a
Copy link
Member

psi29a commented Dec 3, 2015

No real comments, and looks good to me.

psi29a added a commit that referenced this pull request Dec 3, 2015
@psi29a psi29a merged commit b8dc312 into master Dec 3, 2015
@ftomassetti
Copy link
Member Author

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants